首页 | 官方网站   微博 | 高级检索  
     

基于最小类差异的无关信息预处理算法
引用本文:陈治平,林亚平,彭雅,王雷,童调生.基于最小类差异的无关信息预处理算法[J].电子学报,2003,31(11):1750-1753.
作者姓名:陈治平  林亚平  彭雅  王雷  童调生
作者单位:湖南大学计算机与通信学院,湖南长沙 410082
基金项目:国家自然科学基金 (No .60 2 72 0 51 ),湖南省自然科学基金 (No.0 1jjy1 0 0 7)
摘    要:为了降低无关信息对文本分类精度的影响,提出了基于最小类差异的预处理算法.算法通过分析文本特征在类中的分布情况,将特征划分为三种类型,按照特征在各类间的分布差异,保留对分类有作用的单类特征与多类特征,而将类分布差异较小的一般特征进行过滤.实验结果表明,采用新算法进行分类预处理所得到的分类精度明显优于信息增益、互信息量等预处理算法.

关 键 词:信息增益  互信息量  朴素贝叶斯  
文章编号:0372-2112(2003)11-1750-04
收稿时间:2002-12-17

An Irrelevant Information Preprocess Based on the Minimal Class Difference
CHEN Zhi-ping,LIN Ya-ping,PENG Ya,WANG Lei,TONG Tiao-sheng.An Irrelevant Information Preprocess Based on the Minimal Class Difference[J].Acta Electronica Sinica,2003,31(11):1750-1753.
Authors:CHEN Zhi-ping  LIN Ya-ping  PENG Ya  WANG Lei  TONG Tiao-sheng
Affiliation:Coll.of Computer and Communication,Hunan University,Changsha,Hunan 410082,China
Abstract:An irrelevant feature preprocess based on the minimal class difference is proposed.It computes the class distribution difference of features according to their distribution,then divides the features into three types.The new preprocess keeps the features including single-class features and multi-class features which make for classification,and filters the general features with little use for classification.The experimental results show that better performance can be obtained using the new algorithm than using those algorithms such as information gain,mutual information,and cross entropy.
Keywords:information gain  mutual information  naive Bayesian
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《电子学报》浏览原始摘要信息
点击此处可从《电子学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号