首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
为解决传统频繁模式挖掘算法效率不高的问题,提出了一种改进的基于FP-tree (Frequent pattern tree)的Apriori频繁模式挖掘算法.首先,在Apriori算法的连接步加入连接预处理过程;其次,对CP-tree (Compact Pattern tree)进行扩展,构造了一个新的树结构ECP-tree (Extension of Compact Pattern tree),新的树结构只需对数据库进行一次扫描就能构造出一棵紧凑的前缀树,且支持交互式挖掘与增量挖掘;然后,将改进点与APFT算法结合,用于挖掘频繁模式;最后,使用UCI数据库中两个数据集进行实验.实验结果表明:改进算法具有较高的挖掘效率,频繁模式挖掘速度显著提升.  相似文献   

2.
一种单遍扫描频繁模式树结构   总被引:1,自引:0,他引:1  
谭军  卜英勇  杨勃 《计算机工程》2010,36(14):32-33
针对频繁模式增长算法无法适应数据流的无限性和流动性的特点,提出一种新颖的FP-tree的变形结构-SP-tree,只需单遍扫描便能容纳全部数据库信息。为使SP-tree具有与FP-tree一样良好的压缩性能,给出一种有效的动态重构树的方法,称为宽度排序方法,该方法能够在挖掘过程中动态地逐条分支地重构树,最终产生一棵频繁递减的前缀树。实验结果表明,SP-tree的压缩性能优于其他单遍扫描的前缀树结构。  相似文献   

3.
在FP-树中挖掘频繁模式而不生成条件FP-树   总被引:33,自引:1,他引:33  
FP-growth算法是目前已发表的最有效的频繁模式挖掘算法之一.然而,由于在挖掘频繁模式时需要递归地生成大量的条件FP-树,其时空效率仍然不够高.改进了FP-树结构,提出了一种基于被约束子树挖掘频繁项集的有效算法.改进的FP-树是单向的,每个结点只保留指向父结点的指针,这大约节省了三分之一的树空间.通过引入被约束子树(可以用3个很小的数组表示),算法在挖掘频繁模式时不生成条件FP-树,从而大大提高了频繁模式挖掘的时空效率.实验表明,与FP-growth算法相比,算法的挖掘速度提高了1倍以上,而所需的存储空间减少了一半.此外,随着数据库规模的增大,算法具有很好的可伸缩性.对于稠密数据集,算法也具有良好的性能.  相似文献   

4.
关联规则挖掘是数据挖掘重要研究课题,大数据处理对关联规则挖掘算法效率提出了更高要求,而关联规则挖掘的最耗时的步骤是频繁模式挖掘。针对当前频繁模式挖掘算法效率不高的问题,结合Apriori算法和FP-growth算法,提出一种基于事务映射区间求交的频繁模式挖掘算法IITM(interval interaction and transaction mapping),只需扫描数据集两次来生成FP树,然后扫描FP树将每个项的ID映射到区间中,通过区间求交来进行模式增长。该算法解决了Apriori算法需要多次扫描数据集,FP-growth算法需要迭代地生成条件FP树来进行模式增长而带来的效率下降的问题。在真实数据集上的实验显示,在不同的支持度下IITM算法都要要优于Apriori、FP-growth以及PIETM算法。  相似文献   

5.
OSAF-tree--可迭代的移动序列模式挖掘及增量更新方法   总被引:1,自引:0,他引:1  
移动通信技术和无限定位技术的发展积累了海量的、动态增长的时空数据.利用数据挖掘技术从移动用户的时空行为轨迹当中挖掘用户移动序列模式,在移动通信、交通管理、基于位置服务等领域有着广泛的应用前景.由于移动环境网络资源珍贵、数据量大的特点,传统的序列模式挖掘方法在效率上很难满足需求.OSAF-tree算法基于投影的概念,只需要对数据库进行一遍扫描,就可以很好地处理移动序列模式的挖掘及其增量更新和迭代挖掘问题,这是一个非常高效的算法.与已有的方法相比,OSAF-tree算法在性能和I/O代价等方面都具有明显的优势.  相似文献   

6.
基于改进FP-树的最大项目集挖掘算法*   总被引:1,自引:0,他引:1  
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题。FP-growth算法是目前最有效的频繁模式挖掘算法之一,其在挖掘最大项目集时要递归生成大量的条件FP-树,存在时空效率不高的问题。于是结合改进的FP-树,提出了一种快速挖掘最大项目集的算法。该算法利用改进的FP-树是单向的且每个节点只保留指向父节点的指针,可以节约大量的存储空间;同时引入项目序列集和它的基本操作,使挖掘最大频繁项目集时不生成含大量候选项目的集合或条件FP-树,可以快速地挖掘出所有的最大频繁项目集。实例分析证明所提出的算法是可行的。  相似文献   

7.
FP-growth算法是目前较高效的频繁模式挖掘算法之一,该算法不产生候选项集,但递归构造“条件FP-Tree”的CPU 开销和存储很大.为此提出了一种频繁模式挖掘算法IFPmine.首先,为了节省内存空间,采用了约束子树的挖掘方法;其次,采用了数组技术来减少树的遍历时间,从而提高算法的效率.实验结果表明,IFP算法是一种较有效的频繁模式挖掘算法,其挖掘效率优于STFP-树算法和FP-树算法,而需要的内存却少于STFP-树和FP-树算法.  相似文献   

8.
特定数据最大频繁集挖掘算法   总被引:2,自引:0,他引:2       下载免费PDF全文
针对在某些限定项目数与交易长度数据的关联规则挖掘中FP-growth算法执行效率很低的问题,提出一种最大频繁模式挖掘算法,该算法引入与FP-tree结构类似的All-subset tree存储所有的最大频繁项目集,无需在扫描数据库前指定最小支持度,可以动态给定最小支持度而不用重新扫描数据库。实验结果表明,该算法在这些特定数据的挖掘中,与FP-growth相比明显提高了挖掘效率。  相似文献   

9.
一种基于MFP树的快速关联规则挖掘算法   总被引:1,自引:0,他引:1  
在关联规则挖掘FP-Growth算法的基础上,提出一种基于MFP树的快速关联规则挖掘算法。文中给出了MFP算法的工作原理。MFP算法能在一次扫描事务数据库的过程中,把该数据库转换成MFP树,然后对MFP树进行关联规则挖掘。MFP算法比FP-Growth算法减少一次对事务数据的扫描,因此具有较高的时间效率。  相似文献   

10.
Efficient Incremental Maintenance of Frequent Patterns with FP-Tree   总被引:3,自引:0,他引:3       下载免费PDF全文
Mining frequent patterns has been studied popularly in data mining area. However, little work has been done on mining patterns when the database has an influx of fresh data constantly. In these dynamic scenarios, efficient maintenance of the discovered patterns is crucial. Most existing methods need to scan the entire database repeatedly, which is an obvious disadvantage. In this paper, an efficient incremental mining algorithm, Incremental-Mining (IM), is proposed for maintenance of the frequent patterns when new incremental data come. Based on the frequent pattern tree (FP-tree) structure, IM gives a way to make the most of the things from the previous mining process, and requires scanning the original data once at most. Furthermore, IM can identify directly the differential set of frequent patterns, which may be more informative to users. Moreover, IM can deal with changing thresholds as well as changing data, thus provide a full maintenance scheme. IM has been implemented and the performance study shows it outperforms three other incremental algorithms: FUP, DB-tree and re-running frequent pattern growth (FP-growth).  相似文献   

11.
在关联规则挖掘FP-Growth算法的基础上,提出一种基于MFP树的快速关联规则挖掘算法。文中给出了MFP算法的工作原理。MFP算法能在一次扫描事务数据库的过程中,把该数据库转换成MFP树,然后对MFP树进行关联规则挖掘。MFP算法比FP-Growth算法减少一次对事务数据的扫描,因此具有较高的时间效率。  相似文献   

12.
不产生候选的快速投影频繁模式树挖掘算法   总被引:8,自引:0,他引:8  
1.概述近年来,对事务数据库、时序数据库和各种其它类型数据库中的频繁模式挖掘的研究越来越普及。许多先前的研究都是采用Apriori或类似的候选产生—检查迭代算法,使用候选项集来找频繁项集。这些算法都基于一种重要的反单调的Apriori性质:任何非频繁的(k—1)-项集都不可能是频繁k-项集的子集。因此,如果一个候选k-项集的(k—1)-子集不在频繁(k—1)-项集中,则该候选也不可能是频繁的,从而可  相似文献   

13.
石巍  傅彦 《计算机科学》2006,33(6):206-209
通分析FP-growth算法中包含的冗余操作,引入数据结构FP参考树/表,改变FPgrowth算法中条件模式基的存储和生成方式,提出了新的FPRSG算法,高效地解决了频繁模式挖掘问题。理论分析与实验结果表明,FPRSG算法优于FPgrowth算法。  相似文献   

14.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

15.
基于电子病历的频繁模式挖掘研究   总被引:1,自引:1,他引:1  
提出了对电子病历挖掘频繁集的一种方法,该方法引入了一种称为模式仓库结构,它以简洁方式来存储所有相关信息且可直接导出FP-tree;给出了一种称为FP-aprgrowth算法,它综合了Apriori方法和FP-growth方法二者的优点,接着阐明了FP-tree映射到决策树的原理和方法。最后给出一个例子显示其处理过程及效率。  相似文献   

16.
通过对关联规则挖掘技术及经典算法Apriori和FP-growth的研究和分析,提出了一种改进的频繁项集挖掘算法。该算法利用矩阵存储数据,并结合矩阵运算求项集的支持数,有效减少了事务数据库的扫描次数;利用有序频繁项目邻接矩阵创建频繁模式树,有效减少了频繁模式树的分支和层数。通过实例分析了频繁项集的挖掘过程。  相似文献   

17.
一种直接在Trans-树中挖掘频繁模式的新算法   总被引:5,自引:1,他引:5  
范明  王秉政 《计算机科学》2003,30(8):117-120
Frequent pattern mining plays an essential role in many important data mining tasks. FP-growth is a very efficient algorithm for frequent pattern mining. However, it still suffers from creating conditional FP-tree separately and recursively during the mining process. In this paper, we propose a new algorithm, called Least-Item-First Pat-tern Growth (LIFPG), for mining frequent patterns. LIFPG mines frequent patterns directly in Trans-tree withoutusing any additional data structures. The key idea is that least items are always considered first when the current pat-tern growth. By this way, conditional sub-tree can be created directly in Trans-tree by adjusting node-links and re-counting counts of some nodes. Experiments show that, in comparison with FP-Growth, our algorithm is about fourtimes faster and saves half of memory;it also has good time and space scalability with the number of transactions,and has an excellent performance in dense dataset mining as well.  相似文献   

18.
FP-growth算法是一种基于FP-tree数据结构的高效的频繁模式挖掘算法,它不产生候选集。构造频繁模式树FP-tree需扫描数据库两次,在第二遍扫描中还扫描了那些仅包含了非频繁项的事务,针对此问题,在深入分析了FP-tree特性的基础上, 改进了FP-tree构造过程,同时用一种基于Hash表的辅助存储结构,节省了项目查找时间,提高了挖掘效率。  相似文献   

19.
In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering frequent tree patterns over tree-structured data has become an interesting topic for XML data management. In this paper, we propose a tree mining algorithm, named BUXMiner, for finding a special class of frequent trees, called rooted unordered trees, from a tree-structured database. BUXMiner employs an efficient bottom-up approach to enumerate all candidate trees over a compact global tree guide and computes the frequent trees based on the tree guide. In addition to BUXMiner, we also propose a mining approach called BUMXMiner to discover the maximal frequent rooted unordered trees. We compare BUXMiner with previous tree-structure mining algorithms, namely XQPMinerTID and FastXMiner, which were also proposed to discover rooted unordered trees. The experimental results show that our algorithm outperforms XQPMinerTID and FastXMiner in terms of efficiency. The performance results from real-world applications also indicate the usefulness of our proposed tree mining algorithms in a variety of web applications, such as analysis of web page access patterns and mining frequent XML query patterns for caching.  相似文献   

20.
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号