首页 | 官方网站   微博 | 高级检索  
     

基于弱监督学习的中文百科数据属性抽取
引用本文:贾真,杨燕,何大可.基于弱监督学习的中文百科数据属性抽取[J].电子科技大学学报(自然科学版),2014,43(5):758-763.
作者姓名:贾真  杨燕  何大可
作者单位:1.西南交通大学信息科学与技术学院 成都 610031
摘    要:提出基于弱监督学习的属性抽取方法, 利用知识库中已有结构化的属性信息自动获取训练语料, 有效解决了训练语料不足问题. 针对训练语料存在的噪声问题, 提出基于关键词过滤的训练语料优化方法. 提出n元模式特征提取方法, 该特征能够缓解传统n-gram特征稀疏性问题. 实验数据源来自互动百科, 从互动百科信息盒中抽取结构化属性信息构建知识库, 从百科条目文本中自动获取训练数据和测试数据. 实验结果表明, 关键词过滤能有效提高训练语料的质量, 与传统n-gram特征相比, n元模式特征能够提高属性抽取的性能.

关 键 词:属性抽取    特征提取    关系抽取    弱监督学习
收稿时间:2014-02-24

Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning
JIA Zhen,YANG Yan,HE Da-ke.Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning[J].Journal of University of Electronic Science and Technology of China,2014,43(5):758-763.
Authors:JIA Zhen  YANG Yan  HE Da-ke
Affiliation:1.School of Information Science and Technology,Southwest Jiaotong University Chengdu 610031
Abstract:An attribute extraction method based on weakly supervised learning is proposed in the paper. The training corpus is automatically acquired from natural language texts by using structured attribute information from knowledgebase. To solve the problem that noise exists in the training corpus, an optimization method based on keywords filtering is proposed. N-pattern features extraction method is proposed which can relieve to some extent the data sparsity problem of traditional n-gram features. Experiment data are downloaded from Hudong Baike. Structured attribute information is extracted from infoboxes of Hudong Baike and used to construct knowledgebase. Training data and testing data are acquired from encyclopedia entry texts. Experiment results show that the method of keywords filtering can effectively improve the quality of training corpus, and achieve better performance of attribute extraction by using n-pattern features, compared with traditional n-gram features.
Keywords:attribute extraction  feature extraction  relation extraction  weakly supervised learning
本文献已被 万方数据 等数据库收录!
点击此处可从《电子科技大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《电子科技大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号