首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms.  相似文献   

2.
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.  相似文献   

3.
提出了一种新颖的频繁模式挖掘算法,该算法与现有的挖掘算法相比具有明显的优点,首先,该算法不需要产生候选项集,其次该算法具有更少的数据库扫描次数,该算法在中小型数据库上挖掘关联规则只需要扫描交易数据库一次,对于大型交易数据库的关联规则挖掘最多也只需要扫描交易数据库两次。因而,该算法与现有的频繁模式挖掘算法相比具有更高的效率。  相似文献   

4.
Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases.  相似文献   

5.
基于概率衰减窗口模型的不确定数据流频繁模式挖掘   总被引:2,自引:0,他引:2  
考虑到不确定数据流的不确定性,设计了一种新的概率频繁模式树PFP-tree和基于该树的概率频繁模式挖掘方法PFP-growth.PFP-growth使用事务性不确定数据流及概率衰减窗口模型,通过计算各概率数据项的期望支持度以发现概率频繁模式,其主要特点有:考虑到窗口内不同时间到达数据项的贡献度不同,采用概率衰减窗口模型计算期望支持度,以提高模式挖掘准确度;设置数据项索引表和事务索引表,以加快频繁模式树检索速度;通过剪枝删除不可能成为频繁模式的结点,以降低模式树的存储及检索开销;对每个结点都设立一个事务概率信息链表,以支持数据项在不同事务中具有不同概率的情形.实验结果表明,PFP-growth在保证挖掘模式准确度的前提下,在处理时间和内存空间等方面都具有较好的性能.  相似文献   

6.
The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering decision making. Recently, the mining of frequent closed itemsets has become a major research issue because sets of frequent closed itemsets are condensed yet complete representations of frequent itemsets. Therefore, all frequent itemsets can be derived from a group of frequent closed itemsets. Nonetheless, the number of transactions in a transaction database can increase rapidly in a short time period, and a number of the transactions may be outdated. Thus, frequent closed itemsets may be changed with the addition of new transactions or the deletion of old transactions from the transaction database. Updating previously closed itemsets when transactions are added or removed from the transaction database is challenging. This study proposes an efficient algorithm for incrementally mining frequent closed itemsets without scanning the original database. The proposed algorithm updates closed itemsets by performing several operations on the previously closed itemsets and added/deleted transactions without searching the previously closed itemsets. The experimental results show that the proposed algorithm significantly outperforms previous methods, which require a substantial length of time to search previously closed itemsets.  相似文献   

7.
Mining sequential patterns is to discover sequential purchasing behaviours for most of the customers from a large number of customer transactions. The strategy of mining sequential patterns focuses on discovering frequent sequences. A frequent sequence is an ordered list of the itemsets purchased by a sufficient number of customers. The previous approaches for mining sequential patterns need to repeatedly scan the database so that they take a large amount of computation time to find frequent sequences. The customer transactions will grow rapidly in a short time, and some of the customer transactions may be antiquated. Consequently, the frequent sequences may be changed due to the insertion of new customer transactions or the deletion of old customer transactions from the database. It may require rediscovering all the patterns by scanning the entire updated customer transaction database. In this paper, we propose an incremental updating technique to maintain the discovered sequential patterns when transactions are inserted into or deleted from the database. Our approach partitions the database into some segments and scans the database segment by segment. For each segment scan, our approach prunes those sequences that cannot be frequent sequences any more to accelerate the finding process of the frequent sequences. Therefore, the number of database scans can be significantly reduced by our approach. The experimental results show that our algorithms are more efficient than other algorithms for the maintenance of mining sequential patterns.  相似文献   

8.
Traditional frequent pattern mining methods consider an equal profit/weight for all items and only binary occurrences (0/1) of the items in transactions. High utility pattern mining becomes a very important research issue in data mining by considering the non-binary frequency values of items in transactions and different profit values for each item. However, most of the existing high utility pattern mining algorithms suffer in the level-wise candidate generation-and-test problem and generate too many candidate patterns. Moreover, they need several database scans which are directly dependent on the maximum candidate length. In this paper, we present a novel tree-based candidate pruning technique, called HUC-Prune (High Utility Candidates Prune), to solve these problems. Our technique uses a novel tree structure, called HUC-tree (High Utility Candidates tree), to capture important utility information of the candidate patterns. HUC-Prune avoids the level-wise candidate generation process by adopting a pattern growth approach. In contrast to the existing algorithms, its number of database scans is completely independent of the maximum candidate length. Extensive experimental results show that our algorithm is very efficient for high utility pattern mining and it outperforms the existing algorithms.  相似文献   

9.
Most algorithms related to association rule mining are designed to discover frequent itemsets from a binary database. Other factors such as profit, cost, or quantity are not concerned in binary databases. Utility mining was thus proposed to measure the utility values of purchased items for finding high-utility itemsets from a static database. In real-world applications, transactions are changed whether insertion or deletion in a dynamic database. An existing maintenance approach for handling high-utility itemsets in dynamic databases with transaction deletion must rescan the database when necessary. In this paper, an efficient algorithm, called PRE-HUI-DEL, for updating high-utility itemsets based on the pre-large concept for transaction deletion is proposed. The pre-large concept is used to partition transaction-weighted utilization itemsets into three sets with nine cases according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in the deleted transactions. Specific procedures are then applied to each case for maintaining and updating the discovered high-utility itemsets. Experimental results show that the proposed PRE-HUI-DEL algorithm outperforms a batch two-phase algorithm and a FUP2-based algorithm in maintaining high-utility itemsets.  相似文献   

10.
In recent years, generalization-based data mining techniques have become an interesting topic for many data scientists. Generalized itemset mining is an exploration technique that focuses on extracting high-level abstractions and correlations in a database. However, the problem that domain experts must always deal with is how to manage and interpret a large number of extracted patterns from a massive database of transactions. In generalized pattern mining, taxonomies that contain abstraction information for each dataset are defined, so the number of frequent patterns can grow enormously. Therefore, exploiting knowledge turns into a difficult and costly process. In this article, we introduce an approach that uses cardinality-based constraints with transaction id and numeric encoding to mine generalized patterns. We applied transaction id to support the computation of each frequent itemset as well as to encode taxonomies into a numeric type using two simple rules. We also attempted to apply the combination of cardinality cons- traints and closed or maximal patterns. Experiments show that our optimizations significantly improve the performance of the original method, and the importance of comprehensive information within closed and maximal patterns is worth considering in generalized frequent pattern mining.  相似文献   

11.
传统的关联规则挖掘研究事务中所包含的项与项之间的关联性,而负关联规则挖掘不仅要考虑事务中包含的项,还要考虑事务中不包含的项。给出了完全负关联规则的定义,提出一种基于树的算法Free-PNP,通过此算法挖掘数据库中的负频繁模式,继而得到所要挖掘的完全负关联规则。通过实验验证了算法的有效性。  相似文献   

12.
韩萌  丁剑 《计算机应用》2019,39(3):719-727
一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。  相似文献   

13.
When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than an hour and a half. Recommended by: Ahmed Elmagarmid  相似文献   

14.
张璐璐  贾瑞玉  李杰 《微机发展》2006,16(12):73-75
离群数据挖掘是指从大量数据中挖掘明显偏离、不满足一般行为模式的数据。现有的离群数据挖掘算法大多对密集的交易数据库缺乏有效的处理,文中提出了一种高效的基于规则的离群挖掘算法。该算法使用了多层最大离群支持度及最小离群兴趣度,计算1-离群条件集的幂集,并在数据结构中存储了交易标识符链表,使得扫描数据库的次数仅为一次,从而提高了挖掘的速度、效率且使得结果更具有决策意义。文中使用此算法对某一商场的部分销售数据库进行了实验,结果表明该算法能有效、迅速地发现密集数据库中的离群数据。  相似文献   

15.
FP-growth算法的实现方法研究   总被引:8,自引:0,他引:8  
事务数据库中频繁模式的挖掘研究作为关联规则等许多数据挖掘问题的核心工作,已经研究了许多年。早期算法大都是Apriori型算法,即首先产生候选集,然后在候选集的基础上找出频繁模式,候选集的产生往往是耗时的,特别是挖掘富模式或长模式时。JianweiHan等人提出了一种新颖的数据结构FP-tree及基于其上的FP-growth算法,用于有效的富模式与长模式挖掘。由于不同的实现方法可能会导致不同的挖掘效率,该文在讨论FP-growth算法的基础上,采用了几种不同的方法来实现它,并用几个数据库对它们的性能进行了比较。  相似文献   

16.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

17.
A core issue of the association rule extracting process in the data mining field is to find the frequent patterns in the database of operational transactions. If these patterns discovered, the decision making process and determining strategies in organizations will be accomplished with greater precision. Frequent pattern is a pattern seen in a significant number of transactions. Due to the properties of these data models which are unlimited and high-speed production, these data could not be stored in memory and for this reason it is necessary to develop techniques that enable them to be processed online and find repetitive patterns. Several mining methods have been proposed in the literature which attempt to efficiently extract a complete or a closed set of different types of frequent patterns from a dataset. In this paper, a method underpinned upon Cellular Learning Automata (CLA) is presented for mining frequent itemsets. The proposed method is compared with Apriori, FP-Growth and BitTable methods and it is ultimately concluded that the frequent itemset mining could be achieved in less running time. The experiments are conducted on several experimental data sets with different amounts of minsup for all the algorithms as well as the presented method individually. Eventually the results prod to the effectiveness of the proposed method.  相似文献   

18.
IS-树是一种新型的全文存储索引模型.提出一种基于扩展I-S树模型的频繁模式挖掘算法.和FPgrowth方法一样,算法直接构造频繁项集,不进行Apriori算法所采用的代价很高的候选集产生与测试操作.然而它比FP-树模型具有更多的优点:只需扫描一遍事务库;挖掘任务只局部关联于一棵根树;动态更新性好,仅做增量变化.实验表明,其具有与FP-growth算法相当甚至更高的效率.更重要的是,IS 树模型同时是一种事务库的良好索引形式,具有高效支持事务查询的能力.  相似文献   

19.
Existing algorithms of mining frequent XML query patterns (XQPs) employ a candidate generate-and-test strategy. They involve expensive candidate enumeration and costly tree-containment checking. Further, most of existing methods compute the frequencies of candidate query patterns from scratch periodically by checking the entire transaction database, which consists of XQPs transferred from user query logs. However, it is not straightforward to maintain such discovered frequent patterns in real XML databases as there may be frequent updates that may not only invalidate some existing frequent query patterns but also generate some new frequent query patterns. Therefore, a drawback of existing methods is that they are rather inefficient for the evolution of transaction databases. To address above-mentioned problems, this paper proposes an efficient algorithm ESPRIT to mine frequent XQPs without costly tree-containment checking. ESPRIT transforms XML queries into sequences using a one-to-one mapping technique and mines the frequent sequences to generate frequent XQPs. We propose two efficient incremental algorithms, ESPRIT-i and ESPRIT-i +, to incrementally mine frequent XQPs. We devise several novel optimization techniques of query rewriting, cache lookup, and cache replacement to improve the answerability and the hit rate of caching. We have implemented our algorithms and conducted a set of experimental studies on various datasets. The experimental results demonstrate that our algorithms achieve high efficiency and scalability and outperform state-of-the-art methods significantly.  相似文献   

20.
关联规则挖掘算法介绍   总被引:6,自引:0,他引:6  
数据挖掘是一个多学科交叉融合而形成的新兴的学科,它利用各种分析工具在海量数据中发现模型和数据间的关系。而在大规模事务数据库中,挖掘关联规则是数据挖掘领域的一个非常重要的研究课题。文中介绍了关联规则挖掘的研究情况,描述了经典Apriori算法的实现,并对该算法进行了分析和评价,指出了其不足和原因。描述了FP树挖掘最大频繁项集的算法,通过实例对该算法进行了性能评估,并得到结论:数据库中潜在的最大频繁模式越多,运行时间越长。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号