首页 | 官方网站   微博 | 高级检索  
     

基于向量投影的KNN文本分类算法
引用本文:卜凡军,钱雪忠.基于向量投影的KNN文本分类算法[J].计算机工程与设计,2009,30(21).
作者姓名:卜凡军  钱雪忠
作者单位:江南大学信息工程学院,江苏,无锡,214122
基金项目:江苏省自然科学基金项目 
摘    要:针对KNN算法分类时间过长的缺点,分析了提高分类效率的方法.在KNN算法基础上,结合向量投影理论以及iDistance索引结构,提出了一种改进的KNN算法--PKNN.该算法通过比较待分类样本和训练样本的一维投影距离,获得最有可能的临近样本点,减小了参与计算的训练样本数,因此可以减少每次分类的计算量.实验结果表明,PKNN算法可以明显提高KNN算法的效率,PKNN算法的原理决定其适合大容量高维文本分类.

关 键 词:K-近邻  文本分类  投影  效率  高维

K-nearest neighbor text categorization algorithm based on vector projection
BU Fan-jun,QIAN Xue-zhong.K-nearest neighbor text categorization algorithm based on vector projection[J].Computer Engineering and Design,2009,30(21).
Authors:BU Fan-jun  QIAN Xue-zhong
Abstract:Aiming at the problem of the K-nearest neighbor (KNN) in classifying, some researches are carried out to improve efficiency of KNN. An improved KNN algorithm named PKNN is proposed based on the vector projection theory and the iDistance index structure. The PKNN can make a test point get its probable nearest training points according to compare their single dimensional projection distance, the PKNN reduce training points which have nothing to do with the test point, so calculating time is saved. Results of the experiment indicated the PKNN enhance efficiency of text classification, and the PKNN is especially effective in large high-dimensional text categorization.
Keywords:K-nearest neighbor  text categorization  projection  efficiency  high-dimensional
本文献已被 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号