首页 | 官方网站   微博 | 高级检索  
     

关联词约束的半监督文本分类方法
引用本文:韩红旗,朱东华,刘嵩,汪雪锋.关联词约束的半监督文本分类方法[J].计算机工程与应用,2010,46(4):113-116.
作者姓名:韩红旗  朱东华  刘嵩  汪雪锋
作者单位:1. 北京理工大学管理与经济学院,北京,100081;华北水利水电学院管理与经济学院,郑州,45001l
2. 北京理工大学管理与经济学院,北京,100081
基金项目:国家软科学计划No.2008GXS3K056~~
摘    要:提出了一种没有训练集情况下实现对未标注类别文本文档进行分类的问题。类关联词是与类主体相关、能反映类主体的单词或短语。利用类关联词提供的先验信息,形成文档分类的先验概率,然后组合利用朴素贝叶斯分类器和EM迭代算法,在半监督学习过程中加入分类约束条件,用类关联词来监督构造一个分类器,实现了对完全未标注类别文档的分类。实验结果证明,此方法能够以较高的准确率实现没有训练集情况下的文本分类问题,在类关联词约束下的分类准确率要高于没有约束情况下的分类准确率。

关 键 词:半监督  文本分类  类关联词  期望最大化(EM)  朴素贝叶斯
收稿时间:2009-2-6
修稿时间:2009-3-26  

Semi-supervised text classification using class associated words
HAN Hong-qi,ZHU Dong-hua,LIU Song,WANG Xue-feng.Semi-supervised text classification using class associated words[J].Computer Engineering and Applications,2010,46(4):113-116.
Authors:HAN Hong-qi  ZHU Dong-hua  LIU Song  WANG Xue-feng
Affiliation:HAN Hong-qi,ZHU Dong-hua,LIU Song,et al.1.School of Management , Economics,Beijing Institute of Technology,Beijing 100081,China 2.School of Management , Economics,North China University of Water Conservancy , Electric Power,Zhengzhou 450011,China
Abstract:A problem is presented to classify unlabeled text documents without training set.Class associated Words are the Words which represent the subject of classes and provide prior knowledge for training a classifier.A learning algorithm,based on the combination of Expectation-Maximization(EM)and a Na(i)ve Bayes classifier,is introduced to classify documents from fully unlabeled documents using class associated Words.In the algorithm,class associated words are used to set classification constraints during learning process to restrict to classify documents into corresponding class labels and improve the classification accuracy.Experiment results show that the technique can solve the problem with much high accuracy,and the classification accuracy with constraints is higher than that without constraints.
Keywords:semi-supervised  text classification  class associated words  Expectation-Maximization  Naï  ve Bayes
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号