首页 | 官方网站   微博 | 高级检索  
     

基于改进DE-Tri-Training算法的汉语多词表达抽取
引用本文:梁颖红谭红叶鲜学丰黄丹丹钱海忠沈春泽.基于改进DE-Tri-Training算法的汉语多词表达抽取[J].数据采集与处理,2017,32(1):141-148.
作者姓名:梁颖红谭红叶鲜学丰黄丹丹钱海忠沈春泽
作者单位:1.金陵科技学院软件工程学院,南京,211169; 2.山西大学计算机与信息技术学院,太原,030006; 3.苏州市职业大学计算机工程学院,苏州,215104
摘    要:多词表达的识别错误会对很多自然语言处理任务造成不利影响。DE-Tri-Training半指导聚类算法在聚类初期使用有指导的标注信息,取得了较好的抽取结果。本文采用基于中心词扩展的初始聚类中心确定方法和基于有指导信息的一致性协同学习数据净化方法,提出了半指导策略抽取汉语多词表达,聚类算法的中后期也加入有指导的信息,使分类器能使用正确的标注信息进行训练。通过与DE-Tri-Training算法的对比实验,改进的DE-Tri-Training算法得到的汉语多词表达抽取结果优于原来的算法,验证了改进DE-Tri-Training算法的有效性。

关 键 词:多词表达  半指导  协同训练

Chinese Multi-word Expression Extraction Based Improved DE-Tri-Training Algorithm
Abstract:Failing to identify multiword expression (MWE) may cause serious problems for many natural language processing (N LP) tasks. Because of lacking of Chinese MWE tagging corpus, a semi supervised method is used to extract Chinese MWE. DE-Tri-Training semi-supervised clustering algorithm uses supervised information in the beginning of the cluster, and obtains good results. The selection method of original cluster center based head word expansion and the consistency collaborative learning data depuration method based supervised information are proposed, which adds the supervised information into the mid and late steps of clustering, so that classifiers can use correct label information to train it. The contrast experiment show that the extraction results of Chinese multi-word expression using the improved DE-Tri-Training algorithm are better than that of using unimproved one. The effectiveness of the improved DE-Tri-Training algorithm is thus verified.
Keywords:multi-word expression  semi-supervised  tri-training
点击此处可从《数据采集与处理》浏览原始摘要信息
点击此处可从《数据采集与处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号