首页 | 官方网站   微博 | 高级检索  
     

基于MapReduce框架下K-means的改进算法
引用本文:阴爱英,吴运兵,朱敏琛,张莹.基于MapReduce框架下K-means的改进算法[J].计算机应用研究,2018,35(8).
作者姓名:阴爱英  吴运兵  朱敏琛  张莹
作者单位:福州大学 至诚学院计算机工程系,福州大学 数学与计算机科学学院,福州大学 至诚学院计算机工程系,福州大学 至诚学院计算机工程系
基金项目:福建省自然科学(No.2017J01755 ),福建省教育厅中青年教师教育科研项目(NO.JAT160658,No.JAT160077), 福建省科技计划项目(No. 2016R0095)
摘    要:针对海量数据背景下K-means聚类结果不稳定和收敛速度较慢的问题,提出了基于MapReduce框架下的K-means改进算法。首先,为了能获得K-means聚类的初始簇数,利用凝聚层次聚类法对数据集进行聚类,并用轮廓系数对聚类结果进行初步评价,将获得数据集的簇数作为K-means算法的初始簇中心进行聚类;其次,为了能适应于海量数据的聚类挖掘,将改进的K-means算法部署在MapReduce框架上进行运算。实验结果表明,在单机性能上,该方法具有较高的准确率和召回率,同时也具有较强的聚类稳定性;在集群性能上,也具有较好的加速比和运行速度。

关 键 词:MapReduce框架  K-means算法  数据挖掘  聚类分析
收稿时间:2017/4/12 0:00:00
修稿时间:2018/7/4 0:00:00

An Improved K-means Algorithm based on MapReduce Framework
YIN Aiying,WU Yunbing,ZHU Minchen and ZHANG Ying.An Improved K-means Algorithm based on MapReduce Framework[J].Application Research of Computers,2018,35(8).
Authors:YIN Aiying  WU Yunbing  ZHU Minchen and ZHANG Ying
Affiliation:Department of Computer Engineering,Zhicheng College of Fuzhou University,Fuzhou,,,
Abstract:Focusing on the unstable result and slow convergence of K-means clustering algorithm for huge amount of data, an improved K-means algorithm based on MapReduce framework was proposed. Firstly, in order to obtain the initial cluster number of K-means clustering, we use hierarchical clustering method to cluster the dataset, and evaluate the clustering result by silhouette coefficient. The cluster number of the acquired data set is clustered as the initial cluster center of the K-means algorithm. Secondly, in order to adapt to the clustering mining of massive data, we use the modified K-means algorithm to deploy in the MapReduce framework. The experimental results show that the proposed method has high precision and recall rate and strong clustering stability in single machine performance, and also has better speedup ratio and running speed in clustering performance.
Keywords:MapReduce framework  K-means algorithm  data mining  clustering analysis
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号