首页 | 官方网站   微博 | 高级检索  
     

MapReduce框架下Aprioi算法的改进
引用本文:王鑫,王喻红,于娇,葛冬梅.MapReduce框架下Aprioi算法的改进[J].黑龙江工程学院学报,2014(2):70-74.
作者姓名:王鑫  王喻红  于娇  葛冬梅
作者单位:黑龙江工程学院计算机科学与技术学院,黑龙江哈尔滨150050
基金项目:黑龙江省自然科学基金项目(F201224)
摘    要:海量数据利用传统Apriori算法进行挖掘会浪费大量存储空间和通信资源,导致算法效率低下,因此,提出MapReduce框架下Aprioi算法的改进方法,首先采用水平划分的方法将MapReduce数据库分成n个独立的数据块,然后发送到采用动态负载均衡划分的m个工作节点上。每个节点扫描各自的数据块,产生局部候选频繁项集,计算每个候选频繁项集的支持度阈值并与最小支持度阈值进行比较以确定最终的频繁项集。改进后的算法可以减少各个节点之间的数据流动,只需要扫描两次事务数据库就能挖掘出全部频繁项集,节省扫描时间和存储空间,提高挖掘效率。

关 键 词:Hadoop  关联规则  Apriori算法  MapReduce框架

The improvement of Aprioi algorithm under the framework of MapReduce
WANG Xin,WANG Yu-hong,YU Jiao,GE Dong-mei.The improvement of Aprioi algorithm under the framework of MapReduce[J].Journal of Heilongjiang Institute of Technology,2014(2):70-74.
Authors:WANG Xin  WANG Yu-hong  YU Jiao  GE Dong-mei
Affiliation:(College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin 150050,China)
Abstract:In the face of massive data the use of traditional Apriori algorithm for mining will waste a large amount of storage space and communication resources, and lead to the low efficiency of the algorithm. An improved method of Aprioi algorithm is proposed under the framework of MapReduee, first, dividing the MapReduee database into n independent data block using horizontal partitioning method, and then sending to the m working nodes using dynamic load balancing. Each node scans data blocks respectively, generating local candidate frequent itemsets, To compute the threshold for each candidate frequent item set and to compare with minimum support threshold can determine the final set of frequent itemsets. The improved algorithm can reduce the flow of data between each node,and can dig out all frequent itemsets after scanning the transaction database only two times, save the scan time and the storage space, improve the efficiency of mining.
Keywords:Hadoop  association rules  Apriori algrithm  MapReduce framework
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号