Feature selection algorithm for text classification based on improved mutual information |
| |
Authors: | CongShuai ZHANG Ji-bin XU Zhi-ming and WANG Yu-ying |
| |
Affiliation: | School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China |
| |
Abstract: | In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature’s frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system. |
| |
Keywords: | text classification feature selection improved mutual information Biomimetic Pattern Recognition |
本文献已被 CNKI 维普 万方数据 等数据库收录! |
| 点击此处可从《哈尔滨工业大学学报(英文版)》浏览原始摘要信息 |
|
点击此处可从《哈尔滨工业大学学报(英文版)》下载全文 |