首页 | 官方网站   微博 | 高级检索  
     

基于聚类的文本分类算法框架研究
引用本文:黄细凤.基于聚类的文本分类算法框架研究[J].计算机与数字工程,2021,49(1):21-25,93.
作者姓名:黄细凤
作者单位:中国电子科技集团公司第十研究所 成都 610036
基金项目:中国电子科技集团公司第十研究所项目
摘    要:KNN算法因其易于理解、理论成熟等优点而被广泛应用于文本分类.由于KNN需遍历样本空间计算距离,当训练集样本规模较大或维数较高时,计算开销是巨大的.针对此问题,首先将遗传算法适应度函数设计部分与K-medoids算法思想相融合形成K-GA-medoids,其次将其与KNN相结合形成用于文本分类的算法框架,在分类过程中,...

关 键 词:KNN  K-medoids  文本分类  聚类分析  遗传算法

Research on the Framework of Text Classification Algorithms Based on Clustering
HUANG Xifeng.Research on the Framework of Text Classification Algorithms Based on Clustering[J].Computer and Digital Engineering,2021,49(1):21-25,93.
Authors:HUANG Xifeng
Affiliation:(No.10 Research Institute of China Electronics Technology Group Corporation,Chengdu 610036)
Abstract:KNN algorithm is widely used in text categorization because of its easy to understand and mature theory.Because KNN needs to traverse sample space to calculate distance,when the sample size of training set is large or the dimension is high,the computational cost is huge.In response to this problem,firstly,the fitness function design part of genetic algorithm is combined with the idea of K-medoids algorithm to form K-GA-medoids,secondly,it is combined with KNN to form an algorithm framework for text categorization,in the process of classification,the steps of clustering first and then classification are adopted to reduce the training set samples and reduce the computational overhead.Experiments show that the clustering effect of K-GA-medoids is better than that of traditional K-medoids,and compared with the traditional KNN algorithm,the algorithm framework for text categoriza?tion improves the efficiency of text categorization effectively on the premise of guaranteeing the accuracy of classification.
Keywords:KNN  K-medoids  text categorization  cluster analysis  genetic algorithm
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号