首页 | 官方网站   微博 | 高级检索  
     

分布式环境下全局序列模式挖掘技术研究
引用本文:胡孔法,张长海,陈崚,宋爱波,达庆利.分布式环境下全局序列模式挖掘技术研究[J].计算机集成制造系统,2007,13(11):2229-2235.
作者姓名:胡孔法  张长海  陈崚  宋爱波  达庆利
作者单位:[1]扬州大学计算机科学与工程系,江苏扬州225009 [2]东南大学经济管理学院,江苏南京210096 [3]东南大学计算机科学与工程学院,江苏南京210096
基金项目:国家自然科学基金 , 科技部科技基础条件平台建设计划 , 江苏省自然科学基金 , 江苏省教育厅青蓝工程项目
摘    要:由于分布式环境下挖掘全局序列模式常常产生过多候选序列,加大了网络通信代价.为此提出一种基于分布式环境下的全局序列模式快速挖掘算法.该算法将各站点得到的局部序列模式压缩到一种语法序列树上,避免了重复的序列前缀传输;基于合并树中节点序列规则和简单的特点,提出一种项扩展和序列扩展剪枝策略,有效地约减了候选序列,减少了网络传输量,从而快速生成全局序列模式.理论和实验表明,在大数据集环境下该算法性能优越,能够有效地挖掘全局序列模式.

关 键 词:数据挖掘  全局序列模式  语法序列树  项扩展和序列扩展剪枝  分布式  环境  序列模式挖掘  技术研究  distributed  environment  sequential  pattern  mining  算法性能  大数据集  实验  理论  快速生成  网络传输量  约减  剪枝策略  序列扩展  序列规则  中节点  合并树  前缀  序列树
文章编号:1006-5911(2007)11-2229-07
收稿时间:2007-01-04
修稿时间:2007-04-10

Global sequential pattern mining in distributed environment
HU Kong-fa,ZHANG Chang-hai,CHEN Ling,SONG Ai-bo,DA Qing-li.Global sequential pattern mining in distributed environment[J].Computer Integrated Manufacturing Systems,2007,13(11):2229-2235.
Authors:HU Kong-fa  ZHANG Chang-hai  CHEN Ling  SONG Ai-bo  DA Qing-li
Abstract:There were too many candidate sequences generated from sequential pattern mining algorithms in distributed environment which led to communication overhead.To deal with this problem,a new algorithm,Fast Mining of Global Sequential Pattern(FMGSP) in distributed system was proposed.The core idea of this algorithm was to compress local frequent sequential patterns into the corresponding lexicographic sequence tree so as to avoid transmission of repeated prefixes.Based on the regular and simple sequences of merged trees,a new pruning method named Item Extension and Sequence Extension(I/S-E) pruning was presented to prune candidate sequences effectively.Therefore,communication overhead was significantly reduced and global sequential patterns were generated quickly.Theories and experiments showed that the performance of FMGSP was superior,and it was effective specially in mining global sequential patterns for huge amount of data.
Keywords:data mining  global sequential pattern  lexicographic sequence tree  item extension and sequence extension pruning
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号