首页 | 官方网站   微博 | 高级检索  
     

一种基于Hadoop的动态树增量更新方法
引用本文:颜一鸣,郭鑫.一种基于Hadoop的动态树增量更新方法[J].计算机工程,2014(3):67-70,92.
作者姓名:颜一鸣  郭鑫
作者单位:吉首大学软件服务外包学院,湖南张家界427000
基金项目:湖南省工业支撑计划基金资助项目(2012GK2006);湖南省教育厅科学研究基金资助项目(12C0291).
摘    要:为适应真实环境中数据量大、流程复杂、计算密集的数据挖掘需求,提高传统树增量更新挖掘效率,改变已有算法的串行执行方式,提出一种基于Hadoop的动态树增量更新方法。介绍云计算、模型与执行流程等基本概念,针对现有Hadoop平台中任务调度的随机分配策略,设计一种动态云平台中的资源调度与分配算法,以期达到成本消耗的最小化,给出树增量更新挖掘算法以及2个并行算法(DeleteFreqTree和FindNewTree),完成树数据的增量挖掘工作。实验结果表明,该并行算法有效可行,具有高效性与良好的扩展率,能够对海量树数据进行更新挖掘。

关 键 词:数据挖掘  数据库  云计算  并发控制  频繁子树  增量更新

A Dynamic Tree Incremental Updating Method Based on Hadoop
YAN Yi-ming,GUO Xin.A Dynamic Tree Incremental Updating Method Based on Hadoop[J].Computer Engineering,2014(3):67-70,92.
Authors:YAN Yi-ming  GUO Xin
Affiliation:(School of Software and Service Outsourcing, Jishou University, Zhangjiajie 427000, China)
Abstract:In order to deal with problems in true environment caused by data mining tasks with larger amount of data, complex processing and intensive computing, improve the traditional tree incremental updating mining efficiency, and change the existing algorithm of serial implementation methods, this paper proposes a dynamic tree incremental updating method on the basis of Hadoop. It introduces concepts concerning cloud computing, the cloud model, operating process and so on. Then, according to the Hadoop platform task scheduling random distribution strategy, a new dynamic cloud platform resource allocation algorithm is put forward in order to minimize the consumption cost. It designs a new tree incremental updating algorithm on the basis of cloud platform, and two parallel algorithms (DeleteFreqTree, FindNewTree) are proposed. Large number of experiments show that the paralleled algorithm is feasible, highly efficient, expandable, and the algorithm can mine mass tree data effectively.
Keywords:data mining  database  cloud computing  concurrency control  frequent subtree  incremental updating
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号