首页 | 官方网站   微博 | 高级检索  
     

基于汉语二字应成词的歧义字段切分方法
引用本文:郑德权,于凤,王开涛,赵铁军.基于汉语二字应成词的歧义字段切分方法[J].计算机工程与应用,2003,39(1):17-18,26.
作者姓名:郑德权  于凤  王开涛  赵铁军
作者单位:1. 哈尔滨工业大学计算机科学与技术学院,哈尔滨,150001
2. 哈尔滨商业大学计算中心,哈尔滨,150028
基金项目:国家863高技术研究发展计划(编号:2001AA114101)
摘    要:文章提出了利用汉语中的二字应成词,计算汉语句内相邻字之间的互信息1及t-信息差这两个统计信息量的新方法,进而应用这两个统计量,解决汉语自动分词中的歧义字段的自动切分问题。实验结果表明,采用该文所述的方法,对歧义字段的切分正确率将达到90%,与其他分词方法相比较,进一步提高了系统的分词精度,尤其与文献1所述方法比较,对于有大量汉语信息的语料,将降低系统的时间复杂度。

关 键 词:互信息  t-信息差  二字应成词  自动分词  歧义字段
文章编号:1002-8331-(2003)01-0017-02

Ambiguity Word Segmentation Based on Two Chinese Characters Used as a Word in Chinese
Zheng Dequan,Yu Feng,Wang Kaitao Zhao Tiejun.Ambiguity Word Segmentation Based on Two Chinese Characters Used as a Word in Chinese[J].Computer Engineering and Applications,2003,39(1):17-18,26.
Authors:Zheng Dequan  Yu Feng  Wang Kaitao Zhao Tiejun
Affiliation:Zheng Dequan 1 Yu Feng 2 Wang Kaitao Zhao Tiejun 11
Abstract:This paper gives a new method to compute the two statistical measures,interact information and difference of three -character information of adjacent characters,by utilizing two Chinese characters used as a word in Chinese sentences.Further,it resolves ambiguity word automatic segmentation in Chinese.In this paper,the test results appear that the right rate of separating ambiguity is90%.Compared with those by other methods,it improves the accuracy of ambiguity word automatic segmentation,particularly,compared with document Ⅰ,the complexity of time that there are much more information will reduce.
Keywords:interact information  difference of t-information  two  Chinese characters used as a word  word automatic segmentation  ambiguity word
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号