An empirical comparison of min–max-modular <Emphasis Type="Italic">k</Emphasis>-NN with different voting methods to large-scale text categorization |
| |
Authors: | Ke Wu Bao-Liang Lu Masao Utiyama Hitoshi Isahara |
| |
Affiliation: | (1) Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai, 200240, China;(2) Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hilaridai, Seika-cho, Soraku-gun, Kyoto 619-0289, Japan |
| |
Abstract: | Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content. k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different
voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down,
since a test document has to obtain the distance with each training data. On the other hand, min–max-modular k-NN (M3-k-NN) has been applied to large-scale text categorization. M3-k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate
five different voting methods for k-NN and M3-k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all
voting methods for both k-NN and M3-k-NN. In addition, M3-k-NN uses less k-value to achieve the better performance than k-NN, and thus is faster than k-NN in a parallel computing environment.
The work of K. Wu and B. L. Lu was supported in part by the National Natural Science Foundation of China under the grants
NSFC 60375022 and NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai
Jiao Tong University. |
| |
Keywords: | Text categorization k-NN algorithm Min– max-modular k-NN Parallel computing |
本文献已被 SpringerLink 等数据库收录! |
|