首页 | 官方网站   微博 | 高级检索  
     

基于Hadoop平台的C4.5算法的分析与研究
引用本文:孙媛,黄刚.基于Hadoop平台的C4.5算法的分析与研究[J].计算机技术与发展,2014(11):83-86.
作者姓名:孙媛  黄刚
作者单位:南京邮电大学 计算机学院,江苏 南京,210003
基金项目:国家自然科学基金资助项目
摘    要:如何能从海量数据中以更快速、高效、低成本的方式挖掘出有价值的信息成为如今数据挖掘技术面临的新课题。文中在研究Hadoop平台的特征和决策树的C4.5算法的过程中,决定在决策树算法领域中引入云计算思维,实现其在Ha-doop平台上的并行化,并且采用MapReduce模型来解决海量数据挖掘问题。最后用打高尔夫球的数据集对新的算法进行验证。实验结果表明对海量数据,基于Hadoop平台的决策树算法可以明显提高数据挖掘的效率,具有可观的高效性和可扩展性,在一定程度上解决了C4.5算法在处理海量数据时计算量大、构建决策树时间长的问题。

关 键 词:Hadoop  MapReduce  数据挖掘  C4.5算法

Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform
SUN Yuan,HUANG Gang.Analysis and Study of C4 . 5 Algorithm Based on Hadoop Platform[J].Computer Technology and Development,2014(11):83-86.
Authors:SUN Yuan  HUANG Gang
Affiliation:(School of Computer,Nanjing University of Posts and Telecommunications, Nanjing 210003 ,China)
Abstract:How can dig out the valuable information from the vast amount of data in a more rapid,efficient and low-cost way now be-come a new task faced by the data mining technology. In this paper,in the study of the characteristics of the Hadoop platform and the process of decision tree C4. 5 algorithm,decide to introduce the cloud computing thinking to the field of decision tree algorithm,achieve its parallelization on Hadoop platform and use MapReduce model to solve the problem of massive data mining. Finally with using a round of golf data sets to verify this new algorithm,the results of the experiments show that for the huge amounts of data,the decision tree algo-rithm based on Hadoop platform can significantly improve the efficiency of data mining. It has a good efficiency and scalability. In a cer-tain extent,it also solves the problems of computing huge amounts of data and building the decision tree taking long time that C4. 5 algo-rithm faced when dealing with large amount of calculation.
Keywords:Hadoop  MapReduce  data mining  C4  5 algorithm
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号