首页 | 官方网站   微博 | 高级检索  
     

基于平行语料库的双语协同中文关系抽取
引用本文:郭勃,冯旭鹏,刘利军,黄青松.基于平行语料库的双语协同中文关系抽取[J].计算机应用,2017,37(4):1051-1055.
作者姓名:郭勃  冯旭鹏  刘利军  黄青松
作者单位:1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;2. 昆明理工大学 教育技术与网络中心, 昆明 650500;3. 云南省计算机技术应用重点实验室(昆明理工大学), 昆明 650500
基金项目:国家自然科学基金资助项目(81360230,81560296)。
摘    要:针对在中文资源的关系抽取中,由于中文长句句式复杂,句法特征提取难度大、准确度低等问题,提出了一种基于平行语料库的双语协同中文关系抽取方法。首先在中英双语平行语料库中的英文语料上利用英文成熟的句法分析工具,将得到依存句法特征用于英文关系抽取分类器的训练,然后与利用适合中文的n-gram特征在中文语料上训练的中文关系抽取分类器构成双语视图,最后再依靠标注映射后的平行语料库,将彼此高可靠性的语料加入对方训练语料进行双语协同训练,最终得到一个性能更好的中文关系抽取分类模型。通过对中文测试语料进行实验,结果表明该方法提高了基于弱监督方法的中文关系抽取性能,其F值提高了3.9个百分点。

关 键 词:弱监督学习  关系抽取  n-gram  平行语料库  双语协同训练  
收稿时间:2016-09-26
修稿时间:2016-12-21

Bilingual collaborative Chinese relation extraction based on parallel corpus
GUO Bo,FENG Xupeng,LIU Lijun,HUANG Qingsong.Bilingual collaborative Chinese relation extraction based on parallel corpus[J].journal of Computer Applications,2017,37(4):1051-1055.
Authors:GUO Bo  FENG Xupeng  LIU Lijun  HUANG Qingsong
Affiliation:1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;2. Educational Technology and Network Center, Kunming University of Science and Technology, Kunming Yunnan 650500, China;3. Yunnan Provincial Key Laboratory of Computer Technology Applications(Kunming University of Science and Technology), Kunming Yunnan 650500, China
Abstract:In the relation extraction of Chinese resources, the long Chinese sentence style is complex, the syntactic feature extraction is very difficult, and its accuracy is low. A bilingual cooperative relation extraction method based on a parallel corpus was proposed to resolve these above problems. In a Chinese and English bilingual parallel corpus, the English relation extraction classification was trained by dependency syntactic features which obtained by mature syntax analytic tools of English, the Chinese relation extraction classification was trained by n-gram feature which is suitable for Chinese, then they constituted bilingual view. Finally, based on the annotated and mapped parallel corpus, the training corpus with high reliability of both classifications were added to each other for bilingual collaborative training, and a Chinese relation extraction classification model with better performance was acquired. Experimental results on Chinese test corpus show that the proposed method improves the performance of Chinese relation extraction method based on weak supervision, its F value is increased by 3.9 percentage points.
Keywords:weakly-supervised learning                                                                                                                        relation extraction                                                                                                                        n-gram                                                                                                                        parallel corpus                                                                                                                        bilingual collaborative training
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号