首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA*, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA* deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA* over the previous known alternative. SOLARIA* is also linearly scalable in terms of XML queries' size.  相似文献   

2.
王树怡  董东 《计算机科学》2017,44(Z6):486-490
在软件开发过程中,开发人员经常需要遵循特定的API用法模式,而这些用法模式几乎没有相关文档作为参考。为了挖掘API用法模式,提出基于聚类和频繁闭合偏序序列的API用法模式挖掘途径。通过抽象语法树对源代码进行解析,对提取API方法调用序列进行层次聚类,最后使用频繁闭合偏序挖掘算法DFP进行API用法模式的挖掘。实验结果表明,在相同的数据集上,与SPADE算法和BIDE算法相比,所得候选API用法模式集更加精简。  相似文献   

3.
《Knowledge》2007,20(1):86-97
Frequent pattern mining is one of main concerns in data mining tasks. In frequent pattern mining, closed frequent pattern mining and weighted frequent pattern mining are two main approaches to reduce the search space. Although many related studies have been suggested, no mining algorithm considers both paradigms. Even if closed frequent pattern mining represents exactly the same knowledge and weighted frequent pattern mining provides a way to discover more important patterns, the incorporation of closed frequent pattern mining and weight frequent pattern mining may loss information. Based on our analysis of joining orders, we propose closed weighted frequent pattern mining, and present how to discover succinct but lossless closed frequent pattern with weight constraints. To our knowledge, ours is the first work specifically to consider both constraints. An extensive performance study shows that our algorithm outperforms previous algorithms. In addition, it is efficient and scalable.  相似文献   

4.
序列模式的挖掘是近年来的研究热点之一,目前很多研究都集中在闭合频繁项集与闭合序列模式的挖掘,较少涉及更加复杂、有重要应用价值的组合序列模式.针对任意长度和任意组合次数的频繁组合序列模式,提出了一种挖掘全部闭合的组合序列的算法CloCSP.为克服指数量级的候选序列进行闭合检验的困难,提出了既能生成频繁组合序列,又能有效剪枝,并同时完成闭合检验的混合扩展策略,该策略无需维护候选集.实验表明,CloCSP算法能够有效挖掘出隐藏在序列数据中,尤其是稠密数据集内的闭合组合序列模式,有助于揭示更加复杂的序列模式.  相似文献   

5.
; 对于不确定数据的频繁序列模式挖掘,会导致可能频繁模式数量的指数级出现,其中有些无用的挖掘结果,引起频繁序列的冗余。针对上述不足, 提出了可能频繁闭序列模式(pfcsp)的定义, 以及一种基于不确定数据的可能频繁闭序列挖掘算法U-FCSM。此算法中,基于一种元组不确定数据模型,计算序列的可能频繁性,应用BIDE算法的闭序列思想判断可能频繁序列是否是可能频繁闭序列模式。为了减少搜索空间与避免冗余的计算,应用了几个剪枝与边界技术。U-FCSM算法的有效性与效率通过大量的实验得以表明。  相似文献   

6.
Fast and memory efficient mining of frequent closed itemsets   总被引:12,自引:0,他引:12  
This paper presents a new scalable algorithm for discovering closed frequent itemsets, a lossless and condensed representation of all the frequent itemsets that can be mined from a transactional database. Our algorithm exploits a divide-and-conquer approach and a bitwise vertical representation of the database and adopts a particular visit and partitioning strategy of the search space based on an original theoretical framework, which formalizes the problem of closed itemsets mining in detail. The algorithm adopts several optimizations aimed to save both space and time in computing itemset closures and their supports. In particular, since one of the main problems in this type of algorithms is the multiple generation of the same closed itemset, we propose a new effective and memory-efficient pruning technique, which, unlike other previous proposals, does not require the whole set of closed patterns mined so far to be kept in the main memory. This technique also permits each visited partition of the search space to be mined independently in any order and, thus, also in parallel. The tests conducted on many publicly available data sets show that our algorithm is scalable and outperforms other state-of-the-art algorithms like CLOSET+ and FP-CLOSE, in some cases by more than one order of magnitude. More importantly, the performance improvements become more and more significant as the support threshold is decreased.  相似文献   

7.
Mining frequent patterns, including mining frequent closed patterns or maximal patterns, is a fundamental and important problem in data mining area. Many algorithms adopt the pattern growth approach, which is shown to be superior to the candidate generate-and-test approach, especially when long patterns exist in the datasets. In this paper, we identify the key factors that influence the performance of the pattern growth approach, and optimize them to further improve the performance. Our algorithm uses a simple while compact data structure—ascending frequency ordered prefix-tree (AFOPT) to store the conditional databases, in which we use arrays to store single branches to further save space. The AFOPT structure is traversed in top-down depth-first order. Our analysis and experiment results show that the combination of the top-down traversal strategy and the ascending frequency order achieves significant performance improvement over previous works.  相似文献   

8.
Mining frequent patterns, including mining frequent closed patterns or maximal patterns, is a fundamental and important problem in data mining area. Many algorithms adopt the pattern growth approach, which is shown to be superior to the candidate generate-and-test approach, especially when long patterns exist in the datasets. In this paper, we identify the key factors that influence the performance of the pattern growth approach, and optimize them to further improve the performance. Our algorithm uses a simple while compact data structure—ascending frequency ordered prefix-tree (AFOPT) to store the conditional databases, in which we use arrays to store single branches to further save space. The AFOPT structure is traversed in top-down depth-first order. Our analysis and experiment results show that the combination of the top-down traversal strategy and the ascending frequency order achieves significant performance improvement over previous works.  相似文献   

9.
Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases.  相似文献   

10.
钱雪忠  惠亮 《计算机应用》2011,31(5):1339-1343
基于FP-tree的最大频繁模式挖掘算法是目前较为高效的频繁模式挖掘算法,针对这些算法需要递归生成条件FP-tree、产生大量候选最大频繁项集等问题,在分析FPMax、DMFIA算法的基础上,提出基于降维的最大频繁模式挖掘算法(BDRFI)。该算法改传统的FP-tree为数字频繁模式树DFP-tree,提高了超集检验的效率;采用的预测剪枝策略减少了挖掘的次数;基于降低项集维度的挖掘方式,减少了候选项的数目,避免了递归地产生条件频繁模式树,提高了算法的效率。实验结果表明,BDRFI的效率是同类算法的2~8倍。  相似文献   

11.
In this paper, we propose an efficient algorithm, called CMP-Miner, to mine closed patterns in a time-series database where each record in the database, also called a transaction, contains multiple time-series sequences. Our proposed algorithm consists of three phases. First, we transform each time-series sequence in a transaction into a symbolic sequence. Second, we scan the transformed database to find frequent patterns of length one. Third, for each frequent pattern found in the second phase, we recursively enumerate frequent patterns by a frequent pattern tree in a depth-first search manner. During the process of enumeration, we apply several efficient pruning strategies to remove frequent but non-closed patterns. Thus, the CMP-Miner algorithm can efficiently mine the closed patterns from a time-series database. The experimental results show that our proposed algorithm outperforms the modified Apriori and BIDE algorithms.  相似文献   

12.
In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude.  相似文献   

13.
不产生候选的快速投影频繁模式树挖掘算法   总被引:8,自引:0,他引:8  
1.概述近年来,对事务数据库、时序数据库和各种其它类型数据库中的频繁模式挖掘的研究越来越普及。许多先前的研究都是采用Apriori或类似的候选产生—检查迭代算法,使用候选项集来找频繁项集。这些算法都基于一种重要的反单调的Apriori性质:任何非频繁的(k—1)-项集都不可能是频繁k-项集的子集。因此,如果一个候选k-项集的(k—1)-子集不在频繁(k—1)-项集中,则该候选也不可能是频繁的,从而可  相似文献   

14.
Mining frequent sequences in sequential databases are highly valuable for many real-life applications. However, in several cases, especially when databases are huge and when low minimum support thresholds are used, the cardinality of the result set can be enormous. Consequently, algorithms for discovering frequent sequences exhibit poor performance, showing an important increase in execution time, memory consumption and storage space usage. To address this issue, researchers have studied the tasks of mining frequent closed and generator sequences, as they provide several benefits when compared to the set of frequent sequences. One of the most important benefits is that the cardinalities of frequent closed and generator sequences are generally much less than the cardinality of frequent sequences. Hence, humans find it more convenient to analyze the information provided by closed and generator sequences. Moreover, it was shown that frequent closed sequences have the advantage of being lossless, and they thus preserve information about the frequency of all frequent subsequences, while generator sequences can provide higher accuracy for sequence classification tasks since they are the smallest patterns that characterize groups of sequences. Besides, frequent closed sequences can be combined with generators to produce non-redundant sequential rules and recover the complete set of frequent sequences and their frequencies. This paper proposes two novel algorithms named FCloSM and FGenSM to mine frequent closed and generator sequences efficiently. These algorithms are based on new pruning conditions called extended early elimination (3E) and early pruning techniques named EPCLO and EPGEN, designed to identify non-closed and non-generator patterns early. Based on these techniques, two local pruning strategies called LPCLO and LPGEN are proposed to eliminate non-closed and non-generator patterns more efficiently at two successive levels of the prefix search tree without performing subsequence relation checking. These theoretical results, which are the basis of FCloSM and FGenSM, are mathematically proved and are shown to be more general than those presented in previous work. Extensive experiments show that FCloSM and FGenSM are one to two orders of magnitude faster than the state-of-the-art algorithms for discovering frequent closed sequences (CloSpan, BIDE, ClaSP and CM-ClaSP) and for mining frequent generators (FEAT, FSGP and VGEN), and that FCloSM and FGenSM consume much less memory.  相似文献   

15.
频繁模式挖掘是最基本的数据挖掘问题,由于内在复杂性,提高挖掘算法性能一直是个难题.耶是通过数据库混合投影来挖掘频繁模式完全集的全新算法.HP混合投影思想是:任意数据集都不能简单地归入某个单一特性类别,挖掘过程应根据局部数据子集的特性变化动态地调整频繁模式树构造策略、事务子集表示形式、投影方法.HP提出基于树表示的虚拟投影与基于数组表示的非过滤投影,较好地解决了提高时间效率与节省内存空间的矛盾.实验表明,HP时间效率比Apriori,FP—Growth和H-Mine高出1~3个数量级,并且空间可伸缩性也大大优于这些算法.  相似文献   

16.
基于经典的BIDE算法,提出一种多核并行闭合序列模式挖掘算法——MT_BIDE。该算法在频繁序列扩展判断前进行剪枝,在扩展过程中动态调整频繁序列及其伪投影数据集,平衡不同线程间挖掘闭合序列模式的计算量差异。实验结果表明,该算法具有较高的运行效率和加速比。  相似文献   

17.
快速挖掘全局频繁项目集   总被引:32,自引:1,他引:32  
分布式环境中,全局频繁项目集的挖掘是数据挖掘中最重要的研究课题之一.传统的全局频繁项目集挖掘算法采用Apriori算法框架,须多遍扫描数据库并产生大量的候选项目集,且通过传送局部频繁项目集求全局频繁项目集的网络通信代价高.为此,提出了一种分布数据库的全局频繁项目集快速挖掘算法——FMAGF.FMAGF算法采用传送条件频繁模式树或条件模式基来挖掘全局频繁项目集,可有效地减小网络通信量,提高全局频繁项目集挖掘效率.理论分析和实验结果表明提出的算法是有效可行的.  相似文献   

18.
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns.In this study, we propose a novel frequent-pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed, smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent-pattern mining methods.  相似文献   

19.
目前已提出了许多基于Apriori算法思想的频繁项目集挖掘算法,这些算法可以有效地挖掘出事务数据库中的短频繁项目集,但对于长频繁项目集的挖掘而言,其性能将明显下降.为此,提出了一种频繁闭项目集挖掘算法MFCIA,该算法可以有效地挖掘出事务数据库中所有的频繁项目集,并对其更新问题进行了研究,提出了一种相应的频繁闭项目集增量式更新算法UMFCIA,该算法将充分利用先前的挖掘结果来节省发现新的频繁闭项目集的时间开销.实验结果表明算法MFCIA是有效可行的.  相似文献   

20.
陶再平 《计算机工程与设计》2007,28(7):1730-1731,F0003
序列模式挖掘是数据挖掘领域中十分重要的研究课题.目前已有许多算法用于序列模式的挖掘,但在序列模式增量式更新方面的研究还比较少,针对这种情况提出了序列模式增量式更新的挖掘算法SPIU.SPIU算法充分利用了原有的挖掘结果,并对产生的候选频繁序列进行剪枝,有效地减小了候选频繁序列的大小,从而很好地改善了挖掘效率.测试结果表明SPIU算法是正确和高效的,另外算法还具有很好的扩放性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号