首页 | 官方网站   微博 | 高级检索  
     

基于MapReduce技术的并行集成分类算法
引用本文:琚春华,邹江波,张芮,魏建良.基于MapReduce技术的并行集成分类算法[J].电信科学,2012,28(7):40-47.
作者姓名:琚春华  邹江波  张芮  魏建良
作者单位:1. 浙江工商大学信息学院 杭州310018;浙江工商大学现代商贸研究中心 杭州310000
2. 浙江工商大学信息学院 杭州310018
基金项目:国家自然科学基金资助项目,浙江省自然科学基金资助重点项目,国家教育部博士点基金资助项目,浙江省重大科技计划基金资助项目,浙江省自然科学基金资助项目,浙江省研究生科研创新项目
摘    要:由于计算机内存资源限制,分类器组合的有效性及最优性选择是机器学习领域的主要研究内容。经典的集成分类算法在处理小数据集时,拥有较高的分类准确性,但面对大量数据时,由于多基分类器学习、分类共用1台计算机资源,导致运算效率较低,这显然不适合处理当今的海量数据。针对已有集成分类算法只适合作用于小规模数据集的缺点,剖析了集成分类器的特性,采用基于聚合方式的集成分类器和云计算的MapReduce技术设计了并行集成分类算法(EMapReduce),达到并行处理大规模数据的目的。并在Amazon计算集群上模拟实验,实验结果表明该算法具有一定的高效性和可行性。

关 键 词:云计算  集成分类器  并行集成  MapReduce

Parallel Ensemble Classification Algorithm Based on the MapReduce Technology
Ju Chunhua , Zou Jiangbo , Zhang Zui , Wei Jianliang.Parallel Ensemble Classification Algorithm Based on the MapReduce Technology[J].Telecommunications Science,2012,28(7):40-47.
Authors:Ju Chunhua  Zou Jiangbo  Zhang Zui  Wei Jianliang
Affiliation:1 (1.School of Computer Science & Information Engineering,Zhejiang Gongshang University,Hangzhou 310018,China;2.Center for Studies of Modern Business,Zhejiang Gongshang University,Hangzhou 310000,China)
Abstract:Because of the computer memory resource constraints,the effectiveness of the combination of classifier and the optimal choice is the main contents of the field of machine learning.Classic ensemble classification algorithm in dealing with small data sets with a higher classification accuracy,but the face of large amounts of data,more than the base classifier learning,classification occupy mangy computer resources,leading to low computational efficiency,which is obviously not suited to deal with today’s massive data.For the already integrated the classification algorithm is only suitable for the role of the shortcomings of small-scale data sets,analyze the characteristics of the ensemble classifier,using the parallel integration algorithm based on the aggregation of the ensemble classifier and cloud computing,MapReduce technology to achieve parallel processing the purpose of the massive scale of data.And in the Amazon compute cluster to simulate the experimental results show that the algorithm has a certain efficiency and feasibility.
Keywords:cloud computing  ensemble classifier  parallel integration  MapReduce
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号