基于MapReduce和Spark的大数据模糊K-means算法比较 A comparison on big data fuzzy K-means algorithm based on MapReduce and Spark期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于MapReduce和Spark的大数据模糊K-means算法比较

引用本文：	翟俊海,田石,张素芳,王谟瀚,宋丹丹.基于MapReduce和Spark的大数据模糊K-means算法比较[J].河北大学学报(自然科学版),2020,40(4):433-440.

作者姓名：	翟俊海田石张素芳王谟瀚宋丹丹

作者单位：	河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室,河北保定071002,河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室,河北保定071002,中国气象局气象干部培训学院河北分院,河北保定 071000,河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室,河北保定071002,河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室,河北保定071002

基金项目：	研究生创新项目;河北省重点研发计划项目;河北省自然科学基金;河北省研究生专业学位教学案例库建设项目;教育教学改革研究项目

摘要：	从原理和实验2方面对基于MapReduce和Spark的大数据模糊K-均值算法进行分析比较,并对2种大数据开源平台的优缺点进行了总结.由于模糊K-均值算法是一种迭代算法,需要对部分数据进行重复操作以得到最终聚类结果,因此主要从算法执行时间、同步次数、文件数目、容错性能、资源消耗这5方面进行比较,得出的结论对从事大数据研究的人员具有较高的参考价值.
关键词：	大数据机器学习聚类算法模糊聚类算法迭代算法
收稿时间：	2019-09-09
A comparison on big data fuzzy K-means algorithm based on MapReduce and Spark

ZHAI Junhai,TIAN Shi,ZHANG Sufang,WANG Mohan,SONG Dandan.A comparison on big data fuzzy K-means algorithm based on MapReduce and Spark[J].Journal of Hebei University (Natural Science Edition),2020,40(4):433-440.

Authors:	ZHAI Junhai TIAN Shi ZHANG Sufang WANG Mohan SONG Dandan

Affiliation:	1. Hebei Key Laboratory of Machine Learning and Computational Intelligence, College of Mathematicsand Information Science, Hebei University, Baoding 071002, China; 2. Hebei Branch of ChinaMeteorological Administration Training Centre, China Meteorological Administration, Baoding 071000, China

Abstract:	The two big data fuzzy K-means algorithms based on Hadoop and Spark are compared in principle and in experiment, and the advantages and disadvantages of the two big data open source platforms are summarized. As the fuzzy K-means is an iterative algorithm, some data need to be iteratively handled to obtain the final clustering results. Accordingly, the two algorthms are compared on five aspects: running time, number of synchronization of tasks, number of files, fault tolerance, and resource consumption. Some valuable conclusions were obtained, which can be very helpful to reseachers in related fields, especifically for the ones engaging in the study of big data machine learning.

Keywords:	big data machine learning clustering algorithm fuzzy clustering algorithm iterative algorithm
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《河北大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《河北大学学报(自然科学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏