首页 | 官方网站   微博 | 高级检索  
     


An empirical comparison of min–max-modular <Emphasis Type="Italic">k</Emphasis>-NN with different voting methods to large-scale text categorization
Authors:Ke Wu  Bao-Liang Lu  Masao Utiyama  Hitoshi Isahara
Affiliation:(1) Department of Computer Science and Engineering, Shanghai Jiao Tong University, 800 Dong Chuan Road, Shanghai, 200240, China;(2) Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 3-5 Hilaridai, Seika-cho, Soraku-gun, Kyoto 619-0289, Japan
Abstract:Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content. k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down, since a test document has to obtain the distance with each training data. On the other hand, min–max-modular k-NN (M3-k-NN) has been applied to large-scale text categorization. M3-k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate five different voting methods for k-NN and M3-k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all voting methods for both k-NN and M3-k-NN. In addition, M3-k-NN uses less k-value to achieve the better performance than k-NN, and thus is faster than k-NN in a parallel computing environment. The work of K. Wu and B. L. Lu was supported in part by the National Natural Science Foundation of China under the grants NSFC 60375022 and NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai Jiao Tong University.
Keywords:Text categorization            k-NN algorithm  Min–  max-modular k-NN  Parallel computing
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号