首页 | 官方网站   微博 | 高级检索  
     

挖掘滑动窗口中的数据流频繁项算法
引用本文:屠莉,陈崚,包芳.挖掘滑动窗口中的数据流频繁项算法[J].小型微型计算机系统,2012,33(5):940-949.
作者姓名:屠莉  陈崚  包芳
作者单位:1. 江苏省信息融合软件工程技术研发中心,江苏江阴214405;江阴职业技术学院计算机科学系,江苏江阴214405
2. 扬州大学计算机科学与工程系,江苏扬州225009;南京大学南京大学计算机软件新技术国家重点实验室,南京210093
基金项目:国家自然科学基金,江苏省自然科学基金,江苏省教育厅自然科学基金,江苏省信息融合软件工程技术研发中心基金,江苏省普通高校研究生科研创新计划项目
摘    要:滑动窗口是一种对最近一段时间内的数据进行挖掘的有效的技术,本文提出一种基于滑动窗口的流数据频繁项挖掘算法.算法采用了链表队列策略大大简化了算法,提高了挖掘的效率.对于给定的阈值S、误差ε和窗口长度n,算法可以检测在窗口内频度超过Sn的数据流频繁项,且使误差在εn以内.算法的空间复杂度为O(ε-1),对每个数据项的处理和查询时间均为O(1).在此基础上,我们还将该算法进行了扩展,可以通过参数的变化得到不同的流数据频繁项挖掘算法,使得算法的时间和空间复杂度之间得到调节.通过大量的实验证明,本文算法比其它类似算法具有更好的精度以及时间和空间效率.

关 键 词:数据流  频繁项  滑动窗口

Mining Frequent Items in a Stream Based on Sliding Window
TU Li , CHEN Ling , BAO Fang.Mining Frequent Items in a Stream Based on Sliding Window[J].Mini-micro Systems,2012,33(5):940-949.
Authors:TU Li  CHEN Ling  BAO Fang
Affiliation:1,2 1(Information Fusion Software Engineering Research and Development Center of Jiangsu Province,Jiangyin 214405,China) 2(Department of Computer Science,Jiangyin Polytechnic College,Jiangyin 214405,China) 3(Department of Computer Science and Engineering,Yangzhou University,Yangzhou 225009,China) 4(State Key Lab of Novel Software Technology,Nanjing University,Nanjing 210093,China)
Abstract:The sliding window is an effective approach to mine frequent data itmes in the recent period of time.proposed an algorithm for mining frequent items in a stream based on sliding window.The algorithm adopts linked queue strategy which greatly improves the efficiency of the algorithm.Given a threshold S,an error bound ε and the length of the sliding window n,our algorithm can determinately detect the data items within the current window whose frequncy exceeds Sn with an error less than εn using O(ε-1) memory space,and the processing time for each data item and the query time are both O(1).Based on this algorithm,we have proposed a general framework for mining frequent items in data stream based on sliding window.Under this framework,different algorithms can be constructed by changing the parameters which could adjust the time complexity and space cost of the algorithm.Through extensive experiments,we show that our algorithm outperforms other methods in terms of the accuracy,memory requirement,and processing speed.
Keywords:data stream  frequent item  sliding window
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号