首页 | 官方网站   微博 | 高级检索  
     

基于图结构的候选序列生成算法
引用本文:郭平 刘潭仁. 基于图结构的候选序列生成算法[J]. 计算机科学, 2004, 31(1): 136-139
作者姓名:郭平 刘潭仁
作者单位:重庆大学计算机学院,重庆,400044;重庆大学计算机学院,重庆,400044
基金项目:国家十五攻关项目(编号:2002BA107B)
摘    要:先生成候选序列再判断候选序列是否为频繁序列,最后获得频繁序列是序列数据挖掘中基于候选序列挖掘算法的一般结构,如Apriori类算法,GSP算法,SPADE算法等。因此,研究候选序列生成算法具有普遍意义。本文首先研究了序列数据集(序列数据库)与图结构间的关系,证明了一个序列是频繁序列的必要条件是该序列对应于一个完全子图。以此为基础提出了基于图结构的候选序列生成算法,文中给出了算法正确性证明。在T25110D10K和T25120D100K数据集上的挖掘实验表明在本文提出的候选序列生成算法上进行挖掘比用Apriori算法进行挖掘的效率更高。

关 键 词:序列模式  数据挖掘  序列模式挖掘

Graph-Based Candidate Frequent Patterns Generating Algorithm
QUO Ping LIU Tan-Ren. Graph-Based Candidate Frequent Patterns Generating Algorithm[J]. Computer Science, 2004, 31(1): 136-139
Authors:QUO Ping LIU Tan-Ren
Abstract:In candidate-sequence-based mining algorithms, the common procedure is first Generating the candidate frequent patterns, then identifying the frequent patterns based on the candidate frequent patterns, last getting the frequent patterns, such as the apriori-like algorithm, GSP algorithm, SPADE algorithm and so on. Thus there is universal meaning to research the candidate frequent patterns generating algorithm- In this article, firstly we investigate the relationship between sequence data set (sequence database)and graph structure and prove that the requirement of a sequence to be a frequent sequence is that there is a corresponding complete subgraph to the sequence in the graph. Then a graph-based candidate frequent patterns generating algorithm is proposed and the correctness of the algorithm is proved in the article. Lastly the algorithm is applied to T25I10D10K and T25I20D100K data set and comparing with the apriori algorithm it higher the efficiency of sequence data mining.
Keywords:Sequence pattern  Data mining  Sequence pattern mining  
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号