首页 | 官方网站   微博 | 高级检索  
     

基于二维信息增益加权的朴素贝叶斯分类算法
引用本文:任世超,黄子良.基于二维信息增益加权的朴素贝叶斯分类算法[J].计算机系统应用,2019,28(6):135-140.
作者姓名:任世超  黄子良
作者单位:成都信息工程大学 通信工程学院, 成都 610225,成都信息工程大学 通信工程学院, 成都 610225
摘    要:由于朴素贝叶斯算法的特征独立性假设以及传统TFIDF加权算法仅仅考虑了特征在整个训练集的分布情况,忽略了特征与类别和文档之间关系,造成传统方法赋予特征的权重并不能代表其准确性.针对以上问题,提出了二维信息增益加权的朴素贝叶斯分类算法,进一步考虑到了特征的二维信息增益即特征类别信息增益和特征文档信息增益对分类效果的影响,并设计实验与传统的加权朴素贝叶斯算法相比,该算法在查准率、召回率、F1值指标性能上能提升6%左右.

关 键 词:朴素贝叶斯  文本分类  特征加权  二维信息增益  加权算法
收稿时间:2018/11/2 0:00:00
修稿时间:2018/11/23 0:00:00

Naive Bayes Classification Algorithm of Feature Weighting Based on Two-Dimensional Information Gain
REN Shi-Chao and HUANG Zi-Liang.Naive Bayes Classification Algorithm of Feature Weighting Based on Two-Dimensional Information Gain[J].Computer Systems& Applications,2019,28(6):135-140.
Authors:REN Shi-Chao and HUANG Zi-Liang
Affiliation:School of Communication Engineering, Chengdu University of Information Engineering, Chengdu 610225, China and School of Communication Engineering, Chengdu University of Information Engineering, Chengdu 610225, China
Abstract:Naive Bayes algorithm is based on feature-independence assumption and the traditional TF-IDF weighting algorithm, and only considers the distribution of features in the whole training set, but ignores the relationship between feature and categories or documents, so the weights given by traditional method cannot represent its performance. To solve the above problems, this study proposes a naive Bayes classification algorithm of feature weighting based on two-dimensional information gain. It considers the effects of two-dimensional information gain of features, which are the information gain of category and the information gain of documents. Compared with the traditional naive Bayesian algorithm of feature weighting, the proposed algorithm can improve about 6% in the precision, recall, F1 value performance.
Keywords:naive Bayes  text classification  feature weighting  two-dimensional information gain  weighting algorithm
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号