Spark框架下分布式K-means算法优化方法 Optimization method of distributed K-means algorithm based on Spark期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Spark框架下分布式K-means算法优化方法

引用本文：	王法玉,刘志强.Spark框架下分布式K-means算法优化方法[J].计算机工程与设计,2019,40(6):1595-1600.

作者姓名：	王法玉刘志强

作者单位：	天津理工大学智能计算及软件新技术天津市重点实验室,天津,300384;天津理工大学智能计算及软件新技术天津市重点实验室,天津,300384

基金项目：	国家自然科学基金;天津市自然科学基金;天津市普通高等学校本科教学质量与教学改革研究计划

摘要：	针对传统K-means算法在处理海量数据时存在计算效率低和时间复杂度高的缺点,提出一种基于Spark计算框架的改进K-means算法。利用网格单元保存数据点的空间位置信息,通过与聚类中心的空间位置关系减少冗余计算,为提高算法处理海量数据的能力,采用Spark框架对算法进行并行化实现。在集群环境下进行测试,基于Spark框架的改进后算法能有效降低计算的时间复杂度,算法具有良好扩展性,计算效率有显著提高。
关键词：	K-MEANS算法 Spark计算框架分布式网格空间位置
Optimization method of distributed K-means algorithm based on Spark

WANG Fa-yu,LIU Zhi-qiang.Optimization method of distributed K-means algorithm based on Spark[J].Computer Engineering and Design,2019,40(6):1595-1600.

Authors:	WANG Fa-yu LIU Zhi-qiang

Affiliation:	(Key Laboratory of Tianjin City of Intelligent Computing and Software New Technology,Tianjin University of Technology, Tianjin 300384, China)

Abstract:	To solve the disadvantages of low computational efficiency and high time complexity of traditional K-means algorithm when dealing with massive data, an improved K-means algorithm was proposed based on Spark computing framework. The grid cell was used to save spatial position information of the data points, and redundant calculations were reduced through the spatial relationship with clustering centers. To improve the algorithm ’s ability to process massive data, the Spark framework was used to parallelize the algorithm. By testing in a cluster environment, the improved algorithm based on the Spark framework can effectively reduce the computational time complexity, the algorithm shows good scalability, and the computational efficiency is significantly improved.

Keywords:	K-means algorithm Spark calculation framework distributed grid spatial location
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏