首页 | 官方网站   微博 | 高级检索  
     

基于Hash结构词典的双向最大匹配分词法
引用本文:陈之彦,李晓杰,朱淑华,付丹龙,邢诒海.基于Hash结构词典的双向最大匹配分词法[J].计算机科学,2015,42(Z11):49-54.
作者姓名:陈之彦  李晓杰  朱淑华  付丹龙  邢诒海
作者单位:暨南大学国际学院 广州510632,暨南大学国际学院 广州510632,暨南大学信息科学技术学院 广州510632,暨南大学信息科学技术学院 广州510632,广州市经济贸易信息中心 广州510032
基金项目:本文受国家自然科学基金(61272415,7),国家863计划重大项目(2013AA01A212),广东省自然基金团队研究项目(S2012030006242),广州市重点实验室开放基金(2012-224)资助
摘    要:针对当前自然语言处理中中文分词基于词典的机械分词方法,正序词典不能作为逆向最大匹配分词词典以及反序词典维护困难的问题,提出一种新的词典构造方法并设计了相应的双向最大匹配算法,同时在算法中加入了互信息歧义处理模块来处理分词中出现的交集型歧义。该算法可以在分词的过程中显著提高分词的精确度,适用于对词语切分精度要求较高的中文语言处理系统。

关 键 词:分词词典  双向最大匹配法  基于Hash的单字索引  互信息歧义处理

Bi-direction Maximum Matching Method Based on Hash Structural Dictionary
CHEN Zhi-yan,LI Xiao-jie,ZHU Shu-hu,FU Dan-long and XING Yi-hai.Bi-direction Maximum Matching Method Based on Hash Structural Dictionary[J].Computer Science,2015,42(Z11):49-54.
Authors:CHEN Zhi-yan  LI Xiao-jie  ZHU Shu-hu  FU Dan-long and XING Yi-hai
Affiliation:International School,Jinan University,Guangzhou 510632,China,International School,Jinan University,Guangzhou 510632,China,School of Information Science and Technology,Jinan University,Guangzhou 510632,China,School of Information Science and Technology,Jinan University,Guangzhou 510632,China and Guangzhou City Economic and Trade Information Center,Guangzhou 510632,China
Abstract:In the Chinese natural language processing,aimming at the problem that ordinary dictionary cannot be used for reverse maximum matching method and it is difficult to maintain a reverse dictionary,we put forward a new kind of dictionary structure and corresponding bi-direction maximum matching method,and added mutual information ambiguity processing block in the algorithm.Compared with the previous maximum matching method,this algorithm can increase the segmentation accuracy significantly.It is applicable to some Chinese natural language processing systems which have high segmentation accuracy requirement.
Keywords:Segmentation dictionary  Bi-direction maximum matching method  Single word index based on Hash structure  Mutual information ambiguity processing
点击此处可从《计算机科学》浏览原始摘要信息
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号