首页 | 官方网站   微博 | 高级检索  
     

用于文本分类的改进KNN算法
引用本文:王煜,张明,王正欧,白石. 用于文本分类的改进KNN算法[J]. 计算机工程与应用, 2007, 43(13): 159-162
作者姓名:王煜  张明  王正欧  白石
作者单位:河北大学,数学与计算机学院,河北,保定,071002;天津大学,系统工程研究所,天津,300072;沧州市城建档案馆,河北,沧州,061000
摘    要:采用灵敏度方法对距离公式中文本特征的权重进行修正;提出一种基于CURE算法和Tabu算法的训练样本库的裁减方法,采用CURE聚类算法获得每个聚类的代表样本组成新的训练样本集合,然后用Tabu算法对此样本集合进行进一步维护(添加或删除样本),添加样本时只考虑增加不同类交界处的样本,添加或删除样本以分类精度最高、与原始训练样本库距离最近为原则。

关 键 词:文本分类  KNN算法  灵敏度法  CURE聚类算法  Tabu算法
文章编号:1002-8331(2007)13-0159-04
收稿时间:2006-09-15
修稿时间:2006-12-01

An Improved KNN Algorithm Applied to Text Categorization
WANG Yu,ZHANG Ming,WANG Zheng-ou,BAI Shi. An Improved KNN Algorithm Applied to Text Categorization[J]. Computer Engineering and Applications, 2007, 43(13): 159-162
Authors:WANG Yu  ZHANG Ming  WANG Zheng-ou  BAI Shi
Affiliation:1.School of Computer and Mathematics,Hebei University,Baoding,Hebei 071002,China ;2.Institute of Systems Engineering,Tianjin University,Tianjin 300072,China; 3.Urban Construction Archives of Cangzhou,Cangzhou,Hebei 061000,China
Abstract:In this paper,based on the neural network theory,weights of features are adjusted firstly by using sensitivity method.A method is presented to prune training samples for KNN algorithm.First,representative samples set of training sets are acquired based on CRUE clustering algorithm,The representative samples set is taken as the initial set of Tabu algorithm to further maintain.The method only considers the samples at different classes borders when samples are insert into new training set.The principles of deleting or inserting a sample are the higher categorization accuracy principle and the higher similarity with training set principle.The work of pruning and maintenance training samples set is decreased largely.Both satisfied speed and accuracy of classification can be acquired.
Keywords:text categorization    KNN algorithm   sensitivity method    CRUE cluster algorithm   Tabu algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号