基于平行语料库的双语协同中文关系抽取 Bilingual collaborative Chinese relation extraction based on parallel corpus期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于平行语料库的双语协同中文关系抽取

引用本文：	郭勃,冯旭鹏,刘利军,黄青松.基于平行语料库的双语协同中文关系抽取[J].计算机应用,2017,37(4):1051-1055.

作者姓名：	郭勃冯旭鹏刘利军黄青松

作者单位：	1. 昆明理工大学信息工程与自动化学院, 昆明 650500;2. 昆明理工大学教育技术与网络中心, 昆明 650500;3. 云南省计算机技术应用重点实验室(昆明理工大学), 昆明 650500

基金项目：	国家自然科学基金资助项目（81360230，81560296）。

摘要：	针对在中文资源的关系抽取中，由于中文长句句式复杂，句法特征提取难度大、准确度低等问题，提出了一种基于平行语料库的双语协同中文关系抽取方法。首先在中英双语平行语料库中的英文语料上利用英文成熟的句法分析工具，将得到依存句法特征用于英文关系抽取分类器的训练，然后与利用适合中文的n-gram特征在中文语料上训练的中文关系抽取分类器构成双语视图，最后再依靠标注映射后的平行语料库，将彼此高可靠性的语料加入对方训练语料进行双语协同训练，最终得到一个性能更好的中文关系抽取分类模型。通过对中文测试语料进行实验，结果表明该方法提高了基于弱监督方法的中文关系抽取性能，其F值提高了3.9个百分点。
关键词：	弱监督学习关系抽取 n-gram 平行语料库双语协同训练
收稿时间：	2016-09-26
修稿时间：	2016-12-21
Bilingual collaborative Chinese relation extraction based on parallel corpus

GUO Bo,FENG Xupeng,LIU Lijun,HUANG Qingsong.Bilingual collaborative Chinese relation extraction based on parallel corpus[J].journal of Computer Applications,2017,37(4):1051-1055.

Authors:	GUO Bo FENG Xupeng LIU Lijun HUANG Qingsong

Affiliation:	1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;2. Educational Technology and Network Center, Kunming University of Science and Technology, Kunming Yunnan 650500, China;3. Yunnan Provincial Key Laboratory of Computer Technology Applications(Kunming University of Science and Technology), Kunming Yunnan 650500, China

Abstract:	In the relation extraction of Chinese resources, the long Chinese sentence style is complex, the syntactic feature extraction is very difficult, and its accuracy is low. A bilingual cooperative relation extraction method based on a parallel corpus was proposed to resolve these above problems. In a Chinese and English bilingual parallel corpus, the English relation extraction classification was trained by dependency syntactic features which obtained by mature syntax analytic tools of English, the Chinese relation extraction classification was trained by n-gram feature which is suitable for Chinese, then they constituted bilingual view. Finally, based on the annotated and mapped parallel corpus, the training corpus with high reliability of both classifications were added to each other for bilingual collaborative training, and a Chinese relation extraction classification model with better performance was acquired. Experimental results on Chinese test corpus show that the proposed method improves the performance of Chinese relation extraction method based on weak supervision, its F value is increased by 3.9 percentage points.

Keywords:	weakly-supervised learning relation extraction n-gram parallel corpus bilingual collaborative training

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏