首页 | 官方网站   微博 | 高级检索  
     

几种基于词典的中文分词算法评价
引用本文:李丹宁,李丹,王保华,马新强. 几种基于词典的中文分词算法评价[J]. 贵州科学, 2008, 26(3)
作者姓名:李丹宁  李丹  王保华  马新强
作者单位:贵州科学院,贵阳,550003;贵州大学信息工程学院,贵阳,550003
基金项目:贵州省科技厅年度计划项目,黔科合(2004)JN057资助
摘    要:基于词典的中文自动分词是中文信息处理的基础.按照使用计算机缓存的优化原则,分析了几种典型的分词词典机制,指出了其中的一些问题.改进了整词二分法,极大地提高了速度.结合哈希索引和PATRICIA tree搜索算法,提出了一个综合最优化的中文分词系统.

关 键 词:中文信息处理  自动分词  分词词典  缓存优化

THE EVALUATION OF SEVERAL ALGORITHMS FOR DIC-TIONARY-BASED CHINESE WORD SEGMENTATION
LI Dan-ning,LI Dan,WANG Bao-hua,MA Xin-qiang. THE EVALUATION OF SEVERAL ALGORITHMS FOR DIC-TIONARY-BASED CHINESE WORD SEGMENTATION[J]. Guizhou Science, 2008, 26(3)
Authors:LI Dan-ning  LI Dan  WANG Bao-hua  MA Xin-qiang
Affiliation:1.Guizhou Academy of Science;Guiyang 550001;2.School of Information Project;Guizhou University;Guiyang 550003
Abstract:Several typical Chinese word segmentation algorithms based on dictionary were discussed in this paper,and existing problems of these algorithms were identified.The method of binary-seek-by-word was improved through optimizing the usage of computers cache.Combining with the hash index and the PATRICIA tree search mechanisms,an optimized comprehensive Chinese word segmentation method was proposed.
Keywords:Chinese information processing  Chinese word segmentation  segmentation dictionary  cache optimization  
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号