期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Mining frequent closed patterns in pointset databases

Anthony J.T. Lee Wen-Kwang Tsao Po-Yin Chen Ming-Chih Lin Shih-Hui Yang 《Information Systems》2010

In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude. 相似文献

2.

Closed inter-sequence pattern mining

《Journal of Systems and Software》2013,86(6):1603-1612

Inter-sequence pattern mining can find associations across several sequences in a sequence database, which can discover both a sequential pattern within a transaction and sequential patterns across several different transactions. However, inter-sequence pattern mining algorithms usually generate a large number of recurrent frequent patterns. We have observed mining closed inter-sequence patterns instead of frequent ones can lead to a more compact yet complete result set. Therefore, in this paper, we propose a model of closed inter-sequence pattern mining and an efficient algorithm called CISP-Miner for mining such patterns, which enumerates closed inter-sequence patterns recursively along a search tree in a depth-first search manner. In addition, several effective pruning strategies and closure checking schemes are designed to reduce the search space and thus accelerate the algorithm. Our experiment results demonstrate that the proposed CISP-Miner algorithm is very efficient and outperforms a compared EISP-Miner algorithm in most cases. 相似文献

3.

Discovering Transitional Patterns and Their Significant Milestones in Transaction Databases

Wan Qian An Aijun 《Knowledge and Data Engineering, IEEE Transactions on》2009,21(12):1692-1707

A transaction database usually consists of a set of time-stamped transactions. Mining frequent patterns in transaction databases has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithms (such as Apriori and FP-growth) do not consider the time stamps associated with the transactions. In this paper, we extend the existing frequent pattern mining framework to take into account the time stamp of each transaction and discover patterns whose frequency dramatically changes over time. We define a new type of patterns, called transitional patterns, to capture the dynamic behavior of frequent patterns in a transaction database. Transitional patterns include both positive and negative transitional patterns. Their frequencies increase/decrease dramatically at some time points of a transaction database. We introduce the concept of significant milestones for a transitional pattern, which are time points at which the frequency of the pattern changes most significantly. Moreover, we develop an algorithm to mine from a transaction database the set of transitional patterns along with their significant milestones. Our experimental studies on real-world databases illustrate that mining positive and negative transitional patterns is highly promising as a practical and useful approach for discovering novel and interesting knowledge from large databases. 相似文献

4.

Mining frequent trajectory patterns in spatial-temporal databases 总被引：1，自引：0，他引：1

Anthony J.T. Lee Yi-An Chen 《Information Sciences》2009,179(13):2218-2796

In this paper, we propose an efficient graph-based mining (GBM) algorithm for mining the frequent trajectory patterns in a spatial-temporal database. The proposed method comprises two phases. First, we scan the database once to generate a mapping graph and trajectory information lists (TI-lists). Then, we traverse the mapping graph in a depth-first search manner to mine all frequent trajectory patterns in the database. By using the mapping graph and TI-lists, the GBM algorithm can localize support counting and pattern extension in a small number of TI-lists. Moreover, it utilizes the adjacency property to reduce the search space. Therefore, our proposed method can efficiently mine the frequent trajectory patterns in the database. The experimental results show that it outperforms the Apriori-based and PrefixSpan-based methods by more than one order of magnitude. 相似文献

5.

基于线索频繁模式树的关联规则产生算法

李川范明《计算机工程与应用》2004,40(4):188-192

虽然FP-Growth算法能够有效地从数据库中挖掘频繁模式,但如何由其挖掘出的频繁模式中高效地产生关联规则仍是一个相当复杂的问题。该文提出了用于组织频繁模式的线索频繁模式树(TFPT)和一个从TFPT中挖掘关联规则的高效算法—最短模式优先算法(SPF)。挖掘模式Y的关联规则时,SPF算法应用了两个优化策略,避免了对大量的不可能成为规则XY-X左部的Y的子集的检查,从而获得了很好的性能。实验表明:与类FP-Growth算法结合时,SPF算法运行速度远远快于Apriori算法,并有相当好的可伸缩性。相似文献

6.

Frequent Closed Sequence Mining without Candidate Maintenance

Jianyong Wang Jiawei Han Chun Li 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(8):1042-1056

Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and- test paradigm, which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called Bl-Directional Extension and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method. A thorough performance study with both sparse and dense, real, and synthetic data sets has demonstrated that BIDE significantly outperforms the previous algorithm: It consumes an order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size. 相似文献

7.

Efficient data mining for calling path patterns in GSM networks

Anthony J. T. Lee Yao-Te Wang 《Information Systems》2003,28(8):929-948

In this paper, we explore a new data mining capability that involves mining calling path patterns in global system for mobile communication (GSM) networks. Our proposed method consists of two phases. First, we devise a data structure to convert the original calling paths in the log file into a frequent calling path graph. Second, we design an algorithm to mine the calling path patterns from the frequent calling path graph obtained. By using the frequent calling path graph to mine the calling path patterns, our proposed algorithm does not generate unnecessary candidate patterns and requires less database scans. If the corresponding calling path graph of the GSM network can be fitted in the main memory, our proposed algorithm scans the database only once. Otherwise, the cellular structure of the GSM network is divided into several partitions so that the corresponding calling path sub-graph of each partition can be fitted in the main memory. The number of database scans for this case is equal to the number of partitioned sub-graphs. Therefore, our proposed algorithm is more efficient than the PrefixSpan and a priori-like approaches. The experimental results show that our proposed algorithm outperforms the a priori-like and PrefixSpan approaches by several orders of magnitude. 相似文献

8.

Efficient Mining of Frequent Closed XML Query Pattern

下载免费PDF全文

Jian-Hua Feng Qian Qian Jian-Yong Wang and Li-Zhu Zhou 《计算机科学技术学报》2007,22(5):725-735

Previous research works have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. Upon discovery of frequent closed XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Most of the previous algorithms for finding frequent patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present SOLARIA＊, an efficient algorithm for mining frequent closed XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA＊ deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA＊ over the previous known alternative. SOLARIA＊ is also linearly scalable in terms of XML queries＇ size. 相似文献

9.

一种高效的增量式序列模式挖掘算法

下载免费PDF全文

刘佳新《计算机工程》2012,38(12):39-41

现有的增量式挖掘算法在支持度发生变化时,需要对序列数据库进行重复挖掘,为减少由此产生的时空消耗,提出一种高效的增量式序列模式挖掘算法。算法采用频繁序列树作为序列存储结构,当序列数据库和最小支持度发生变化时,通过执行更新操作,实现频繁序列树的更新,利用深度优先遍历频繁序列树找到序列数据库中所有的序列模式。实验结果表明,与IncSpan算法和PrefixSpan算法相比,该算法的挖掘效率较高。相似文献

10.

长生物数据集中频繁闭合模式挖掘算法研究

下载免费PDF全文

周明李宏《计算机工程》2007,33(2):74-76

传统频繁项集挖掘算法在处理稠密或长数据集(如基因表达数据集)时效率低且产生大量冗余模式，为解决这些问题一些学者提出了闭合模式的概念和挖掘闭合模式的算法，研究证明挖掘闭合模式可以显著减少项集数量并消除大量冗余模式。该文针对生物数据特点提出了一个新颖的挖掘频繁闭合模式的算法REMFOR，该算法在闭合模式概念和行枚举思想的基础上，采用垂直数据结构和fp-tree技术，对行集建立行fp-tree来挖掘频繁闭合模式。通过实例和实验证明该算法是正确有效的。相似文献