首页 | 官方网站   微博 | 高级检索  
     


Feature selection algorithm for text classification based on improved mutual information
Authors:CongShuai  ZHANG Ji-bin  XU Zhi-ming and WANG Yu-ying
Affiliation:School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Abstract:In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature’s frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system.
Keywords:text classification  feature selection  improved mutual information  Biomimetic Pattern Recognition
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《哈尔滨工业大学学报(英文版)》浏览原始摘要信息
点击此处可从《哈尔滨工业大学学报(英文版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号