基于MapReduce的K_means并行算法及改进 Parallel K-Means Algorithm and Improved Based on MapReduce期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于MapReduce的K_means并行算法及改进

引用本文：	衣治安,王月.基于MapReduce的K_means并行算法及改进[J].计算机系统应用,2015,24(6):188-192.

作者姓名：	衣治安王月

作者单位：	东北石油大学计算机与信息技术学院,大庆,163318

摘要：	针对传统k_means聚类算法在处理海量数据时所面临的内存不足、运算速度慢等问题，提出了一种基于MapReduce的K_means并行算法，同时为了改善k_means算法在初始值确定方面的盲目性，采用canopy算法进行改进。实验结果表明，基于MapReduce的K_means并行算法和改进后的算法均能产生良好的聚类效果，不仅提高了聚类质量，而且在处理大数据集方面，改进后的算法的还能够得到趋近于线性的加速比。
关键词：	MapReduce k-means算法 canopy算法并行计算聚类
收稿时间：	2014/10/11 0:00:00
修稿时间：	2014/11/13 0:00:00
Parallel K-Means Algorithm and Improved Based on MapReduce

YI Zhi-An and WANG Yue.Parallel K-Means Algorithm and Improved Based on MapReduce[J].Computer Systems& Applications,2015,24(6):188-192.

Authors:	YI Zhi-An and WANG Yue

Affiliation:	Northeast Petroleum University, College of Computer and Information Technology, Daqing 163318, China;Northeast Petroleum University, College of Computer and Information Technology, Daqing 163318, China

Abstract:	In view of the problems that traditional k-means clustering algorithm faces in dealing with mass data, such as running out of memory, the operating in slow speed and so on, this paper proposes a parallel k-means algorithm based on MapReduce. At the same time, in order to overcome the blindness of the k-means algorithm in terms of determining the initial value, we use the canopy algorithm to improve the insufficient. The experimental results show that the parallel k-means algorithm based on MapReduce has an effect on clustering before and after the improvement, not only the quality of the clustering has been increased, but in terms of processing large datasets. The speed-up ratio of the improved algorithm can get closer to the linear.

Keywords:	MapReduce k-means algorithm canopy algorithm parallel computation cluster
本文献已被万方数据等数据库收录！
	点击此处可从《计算机系统应用》浏览原始摘要信息
	点击此处可从《计算机系统应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏