首页 | 官方网站   微博 | 高级检索  
     

分段扫描生成频繁项目序列集的挖掘算法
引用本文:毛国君,刘椿年.分段扫描生成频繁项目序列集的挖掘算法[J].计算机工程与应用,2004,40(7):19-21,202.
作者姓名:毛国君  刘椿年
作者单位:1. 北京市多媒体与智能软件重点实验室,北京,100022
2. 北京工业大学计算机学院,北京,100022
基金项目:国家自然科学基金(编号:60173014),北京市自然科学基金(编号:4022003),北京市教委资金资助
摘    要:关联规则挖掘是数据挖掘研究的重要分支。发现频繁项目序列集又是关联规则挖掘中的一个关键阶段。十几年来,许多发现频繁项目集的算法已经被提出。近几年来,人们更关注于在大型数据集中高效发现频繁项目集的算法研究,特别是在减少数据库的扫描次数、提高内存利用率等方面。该文提出一个称为DFISP的算法,它是基于数据分段扫描策略的,并且只需两次数据库扫描即可完成频繁项目序列集的生成。实验表明,DFISP算法是稳定而高效的。

关 键 词:数据挖掘  关联规则  项目序列(集)  数据分段扫描
文章编号:1002-8331-(2004)07-0019-03

An Algorithm for Mining Frequent Itemsequences by Partitioning Data
Mao Guojun Liu Chunnian.An Algorithm for Mining Frequent Itemsequences by Partitioning Data[J].Computer Engineering and Applications,2004,40(7):19-21,202.
Authors:Mao Guojun Liu Chunnian
Abstract:Mining association rules from databases is an important research branch of data mining,and discovering frequent itemsets or itemsequences is a key phase in mining association rules.Many algorithms have been proposed in the literatures.Recent researches have paid more attention to high mining efficiency,including reducing the number of passes over databases,memory usage and I /O costs.This paper gives a new algorithm for discovering frequent itemsequences,called DFISP,which employs two passes over databases and improves its mining efficiency in large databases by using data-partitioning scan technique.Experimental results show that it could keep memory usage space within acceptable ranges as well as achieve satisfying execution efficiency as increasing the size of the databases.
Keywords:Data Mining  Association Rules  Itemsequences(Itemsequence Sets)  Data-Partitioning Scan
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号