首页 | 官方网站   微博 | 高级检索  
     

基于BootStrapping的集成分类器的中文观点句识别方法
引用本文:吕云云,李旸,王素格.基于BootStrapping的集成分类器的中文观点句识别方法[J].中文信息学报,2013,27(5):84-93.
作者姓名:吕云云  李旸  王素格
作者单位:1. 山西大学 计算机与信息技术学院,山西 太原 030006;
2. 山西大学 计算智能与中文信息处理教育部重点实验室,山西 太原 030006
基金项目:国家自然科学基金资助项目,山西省自然科学基金资助项目,山西省科技攻关项目
摘    要:领域相关的大规模和高质量的标注训练数据是分类器性能的重要保证,而标注训练语料是一件费时费力的工作。该文提出了一种采用小规模标注语料识别中文观点句的方法。首先采用Bootstrapping方法扩展训练语料,分别训练贝叶斯、支持向量机和最大熵分类器。最后,通过给三个训练好的分类器赋权获得一个集成分类器。实验结果表明,集成后的分类器性能优于单分类器,并且该方法在使用部分标注训练数据的情况下也能取得与采用全部标注训练数据相近的实验结果。

关 键 词:观点句识别  BootStrapping  集成分类器  

A Method for Chinese Opinion Sentence Identification Based on the Ensemble Classifier with BootStrapping
LV Yunyun , LI Yang , WANG Suge.A Method for Chinese Opinion Sentence Identification Based on the Ensemble Classifier with BootStrapping[J].Journal of Chinese Information Processing,2013,27(5):84-93.
Authors:LV Yunyun  LI Yang  WANG Suge
Affiliation:1. School of Computer & Information Technology, Shanxi University, Taiyuan, Shanxi 030006,China; 2. Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,
Shanxi University, Taiyuan, Shanxi 030006,China
Abstract:The large scale and high quality domain training data is an important guarantee for constructing a high performance classifier. However, it is an expensive work to label a large scale corpus in a domain. In this paper, we propose a method for identifying Chinese opinion sentences using a small-scale labeled corpus. At first, the method uses BootStrapping to expand the small-scale labeled corpus. Using the expanded labeled corpus we then train three classifiers that are based on naive Bayes, support vector machine and maximum entropy respectively. At last, an ensemble classifier is obtained by assigning a set of probability weights to the three trained classifiers. Experimental results indicate that the ensemble classifier is superior to the three single classifiers. And the proposed method can achieve the analogous experimental results by using partially labeled training data or using totally labeled training data.
Key wordsopinion sentence identifying; BootStrapping; ensemble classifier
Keywords:opinion sentence identifying  BootStrapping  ensemble classifier
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《中文信息学报》浏览原始摘要信息
点击此处可从《中文信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号