首页 | 官方网站   微博 | 高级检索  
     

基于数据集压缩的聚类算法性能优化研究
引用本文:赵延龙.基于数据集压缩的聚类算法性能优化研究[J].计算机应用研究,2018,35(5).
作者姓名:赵延龙
作者单位:空军工程大学信息与导航学院
摘    要:针对目前聚类算法对大数据集的聚类分析中存在时间花费过大的问题,提出了一种基于最近邻相似性的数据集压缩算法。通过将若干个相似性最近邻的数据点划分成一个数据簇并随机选择簇头构成新的数据集,大大缩减了数据的规模。然后分别采用k-means算法和AP算法对压缩后的数据集进行聚类分析。实验结果表明,压缩后的数据集与原始数据集的聚类分析相比,在保证聚类准确率基本一致的前提下有效降低了聚类的花费时长,提高了算法的聚类性能,证明该数据集压缩算法在聚类分析中的有效性与可靠性。

关 键 词:聚类  大数据  数据压缩  聚类性能
收稿时间:2017/1/10 0:00:00
修稿时间:2018/3/24 0:00:00

Research on The Optimization of Clustering Algorithm Performance Based on Data Set Compression
zhaoyanlong.Research on The Optimization of Clustering Algorithm Performance Based on Data Set Compression[J].Application Research of Computers,2018,35(5).
Authors:zhaoyanlong
Affiliation:College of Information and Navigation, Air Force Engineering University
Abstract:A data set compression algorithm based on nearest neighbor similarity is proposed to solve the problem that the clustering algorithm is too expensive in the large data clustering analysis. The size of the data set is greatly reduced by dividing several data points nearest to each other into a data cluster and forming new data set with randomly selecting cluster heads. Then the k-means algorithm and the AP algorithm are used to cluster the compressed datasets respectively. The experimental results show that compared with the original data set clustering analysis, the compressed dataset can reduce the time of clustering and improve the clustering performance of the algorithm in the case of the clustering accuracy is basically the same. which prove that the validity and reliability of data set Compression Algorithm in cluster analysis.
Keywords:clustering  big data  data compression  clustering performance
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号