首页 | 官方网站   微博 | 高级检索  
     

基于时间衰减模型的数据流频繁模式挖掘
引用本文:吴枫,仲妍,吴泉源.基于时间衰减模型的数据流频繁模式挖掘[J].自动化学报,2010,36(5):674-684.
作者姓名:吴枫  仲妍  吴泉源
作者单位:1.国防科学技术大学计算机学院 长沙 410073
基金项目:国家高技术研究发展计划(863计划)(2006AA01Z451,2007AA01Z474)资助~~
摘    要:频繁模式挖掘是数据流挖掘中的重要研究课题. 针对数据流的时效性和流中心的偏移性特点, 提出了界标窗口模型与时间衰减模型相结合的数据流频繁模式挖掘算法. 该算法通过动态构建全局模式树, 利用时间指数衰减函数对模式树中各模式的支持数进行统计, 以此刻画界标窗口内模式的频繁程度; 进而, 为有效降低空间开销, 设计了剪枝阈值函数, 用于对预期难以成长为频繁的模式及时从全局树中剪除. 本文对出现在算法中的重要参数和阈值进行了深入分析. 一系列实验表明, 与现有同类算法MSW相比, 该算法挖掘精度高(平均超过90%), 内存开销小, 速度上可以满足高速数据流的处理要求, 且可以适应不同事务数量、不同事务平均长度和不同最大潜在频繁模式平均长度的数据流频繁模式挖掘.

关 键 词:数据流    频繁模式挖掘    数据挖掘    时间衰减模型
收稿时间:2008-12-15
修稿时间:2009-11-6

Mining Frequent Patterns over Data Stream under the Time Decaying Model
WU Feng ZHONG Yan WU Quan-Yuan .School of Computer,National University of Defense Technology,Changsha.Mining Frequent Patterns over Data Stream under the Time Decaying Model[J].Acta Automatica Sinica,2010,36(5):674-684.
Authors:WU Feng ZHONG Yan WU Quan-Yuan School of Computer  National University of Defense Technology  Changsha
Affiliation:1.School of Computer, National University of Defense Technology,Changsha 410073
Abstract:Frequent-pattern mining is an important task in mining data streams. Considering the timeliness of data stream and the shift of the stream center, we propose an algorithm of data stream frequent pattern mining, named DFPMiner, which combines both the landmark window and the time decaying model. By dynamically constructuring the global pattern tree, the method uses time exponential decay function to account the support of each pattern to describe the frequent degree within the landmark window. Moreover, to reduce the space cost effectively, the constructed pruning threshold function is used to cut in time the pattern that is not able to turn into the frequent pattern from the global tree. This paper deeply analyzes the important parameters and thresholds in the algorithm. The experimental results show that comparing with the algorithm MSW, DFPMiner has higher mining precision (over 90% in average) and lower storage cost, and it can meet the request of processing high-speed data streams in the execution. Our method can be applied to the streams of different numbers of transactions, different average transaction sizes, and different maximal potentially frequent pattern sizes.
Keywords:Data stream  frequent pattern mining  data mining  time decaying model
本文献已被 CNKI 等数据库收录!
点击此处可从《自动化学报》浏览原始摘要信息
点击此处可从《自动化学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号