期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Batch incremental processing for FP-tree construction using FP-Growth algorithm

Shashikumar G. Totad R. B. Geeta P. V. G. D. Prasad Reddy 《Knowledge and Information Systems》2012,33(2):475-490

In the present scenario of global economy and World Wide Web, large sets of evolving and distributed data can be handled efficiently by incremental data mining. Frequent patterns are very important in knowledge discovery and data mining process, such as mining of association rules, correlations. FP-tree is a very versatile data structure used for mining of frequent patterns in knowledge discovery and data mining process. FP-tree is a compact representation of transaction database that contains frequency information of all relevant frequent patterns (FP) of the database. All of the existing incremental frequent pattern mining algorithms, such as AFPIM, CATS, CanTree, CP-tree, and SPO-tree, perform incremental mining by processing one transaction of the incremental part of database at a time and updating it to the FP-tree of initial (original) database. Here, in this paper, we propose a novel method that takes advantage of FP-tree representation of incremental transaction database for incremental mining. We propose a batch incremental processing algorithm BIT_FPGrowth that restructures and merges two small consecutive duration FP-trees to obtain a FP-tree of the FP-Growth algorithm. Our BIT_FPGrowth uses FP-tree as preprocessed data repository to get transactions (i.e., item-sets), unlike other sequential incremental algorithms that read transactions from database. BIT_FPGrowth algorithm takes less time for constructing FP-tree. Our experimental results show that, as the size of the database increases, increase in runtime of BIT_FPGrowth is much less and is least of all the other algorithms. 相似文献

2.

Mining constrained inter-sequence patterns: a novel approach to cope with item constraints

Tuong Le Anh Nguyen Bao Huynh Bay Vo Witold Pedrycz 《Applied Intelligence》2018,48(5):1327-1343

Data mining has become increasingly important in the Internet era. The problem of mining inter-sequence pattern is a sub-task in data mining with several algorithms in the recent years. However, these algorithms only focus on the transitional problem of mining frequent inter-sequence patterns and most frequent inter-sequence patterns are either redundant or insignificant. As such, it can confuse end users during decision-making and can require too much system resources. This led to the problem of mining inter-sequence patterns with item constraints, which addressed the problem when end-users only concerned the patterns contained a number of specific items. In this paper, we propose two novel algorithms for it. First is the ISP-IC (Inter-Sequence Pattern with Item Constraint mining) algorithm based on a theorem that quickly determines whether an inter-sequence pattern satisfies the constraints. Then, we propose a way to improve the strategy of ISP-IC, which is then applied to the \(i\)ISP-IC algorithm to enhance the performance of the process. Finally, pi ISP-IC, a parallel version of \(i\)ISP-IC, will be presented. Experimental results show that pi ISP-IC algorithm outperforms the post-processing of the-state-of-the-art method for mining inter-sequence patterns (EISP-Miner), ISP-IC, and \(i\)ISP-IC algorithms in most of the cases. 相似文献

3.

PARM—An Efficient Algorithm to Mine Association Rules From Spatial Data

《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2008,38(6):1513-1524

Association rule mining, originally proposed for market basket data, has potential applications in many areas. Spatial data, such as remote sensed imagery (RSI) data, is one of the promising application areas. Extracting interesting patterns and rules from spatial data sets, composed of images and associated ground data, can be of importance in precision agriculture, resource discovery, and other areas. However, in most cases, the sizes of the spatial data sets are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an efficient approach to derive association rules from spatial data using Peano Count Tree (P-tree) structure. P-tree structure provides a lossless and compressed representation of spatial data. Based on P-trees, an efficient association rule mining algorithm PARM with fast support calculation and significant pruning techniques is introduced to improve the efficiency of the rule mining process. The P-tree based Association Rule Mining (PARM) algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data. 相似文献

4.

RP-Miner: a relaxed prune algorithm for frequent similar pattern mining 总被引：2，自引：1，他引：1

Ansel Yoan Rodríguez-González José Francisco Martínez-Trinidad Jesús Ariel Carrasco-Ochoa José Ruiz-Shulcloper 《Knowledge and Information Systems》2011,27(3):451-471

相似文献

5.

Bottom-up discovery of frequent rooted unordered subtrees

Yijun Bei Gang Chen Lidan Shou Xiaoyan Li Jinxiang Dong 《Information Sciences》2009,179(1-2):70-88

In the past decade, XML has emerged as the standard language for information exchanging over the Internet. Due to its tree-structure paradigm, XML is superior for its capability of storing, querying, and manipulating complex data. Therefore, discovering frequent tree patterns over tree-structured data has become an interesting topic for XML data management. In this paper, we propose a tree mining algorithm, named BUXMiner, for finding a special class of frequent trees, called rooted unordered trees, from a tree-structured database. BUXMiner employs an efficient bottom-up approach to enumerate all candidate trees over a compact global tree guide and computes the frequent trees based on the tree guide. In addition to BUXMiner, we also propose a mining approach called BUMXMiner to discover the maximal frequent rooted unordered trees. We compare BUXMiner with previous tree-structure mining algorithms, namely XQPMinerTID and FastXMiner, which were also proposed to discover rooted unordered trees. The experimental results show that our algorithm outperforms XQPMinerTID and FastXMiner in terms of efficiency. The performance results from real-world applications also indicate the usefulness of our proposed tree mining algorithms in a variety of web applications, such as analysis of web page access patterns and mining frequent XML query patterns for caching. 相似文献

6.

Mining sequential patterns with periodic wildcard gaps

Youxi Wu Lingling Wang Jiadong Ren Wei Ding Xindong Wu 《Applied Intelligence》2014,41(1):99-116

Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively. 相似文献

7.

A novel approach for mining maximal frequent patterns

《Expert systems with applications》2017

Mining maximal frequent patterns (MFPs) is an approach that limits the number of frequent patterns (FPs) to help intelligent systems operate efficiently. Many approaches have been proposed for mining MFPs, but the complexity of the problem is enormous. Therefore, the run time and memory usage are still large. Recently, the N-list structure has been proposed and verified to be very effective for mining FPs, frequent closed patterns, and top-rank-k FPs. Therefore, this paper uses the N-list structure for mining MFPs. A pruning technique is also proposed to prune branches to reduce the search space. This technique is applied to an algorithm called INLA-MFP (improved N-list-based algorithm for mining maximal frequent patterns) for mining MFPs. Experiments were conducted to evaluate the effectiveness of the proposed algorithm. The experimental results show that INLA-MFP outperforms two state-of-the-art algorithms for mining MFPs. 相似文献

8.

Mining frequent patterns and association rules using similarities

Ansel Y. Rodríguez-González José Fco. Martínez-Trinidad Jesús A. Carrasco-Ochoa José Ruiz-Shulcloper 《Expert systems with applications》2013,40(17):6823-6836

相似文献

9.

Stream mining on univariate uncertain data

Ying-Ho Liu 《Applied Intelligence》2013,39(2):315-344

In this paper, we propose mining frequent patterns from univariate uncertain data streams, which have a quantitative interval for each attribute in a transaction and a probability density function indicating the possibilities that the values in the interval appear. Many data streams comprise flows of univariate uncertain data, for example, the records of atmospheric pollution sensors, and network monitoring records. We propose two algorithms to address this issue: the ExactU2Stream algorithm and the ApproxiU2Stream algorithm. The former incrementally stores the incoming transactions, and delays the mining process until it is requested. The latter mines the transactions immediately when they arrive, and stores the derived frequent patterns. Compared with the latter, the former returns results that are more accurate, but it also requires more response time. Both algorithms utilize the sliding window scheme, which decomposes the continuous data stream into discrete, overlapping chunks. The proposed algorithms outperform the compared methods in terms of runtime and memory usage. We have applied the two proposed algorithms to the data streams recording the air quality in Taiwan; the derived frequent patterns not only show the common air quality in Taiwan but also show the extremely bad air quality when a sand storm affects Taiwan. 相似文献

10.

Top-down mining of frequent closed patterns from very high dimensional data

Hongyan Liu Xiaoyu Wang 《Information Sciences》2009,179(7):899-127

Frequent pattern mining is an essential theme in data mining. Existing algorithms usually use a bottom-up search strategy. However, for very high dimensional data, this strategy cannot fully utilize the minimum support constraint to prune the rowset search space. In this paper, we propose a new method called top-down mining together with a novel row enumeration tree to make full use of the pruning power of the minimum support constraint. Furthermore, to efficiently check if a rowset is closed, we develop a method called the trace-based method. Based on these methods, an algorithm called TD-Close is designed for mining a complete set of frequent closed patterns. To enhance its performance further, we improve it by using new pruning strategies and new data structures that lead to a new algorithm TTD-Close. Our performance study shows that the top-down strategy is effective in cutting down search space and saving memory space, while the trace-based method facilitates the closeness-checking. As a result, the algorithm TTD-Close outperforms the bottom-up search algorithms such as Carpenter and FPclose in most cases. It also runs faster than TD-Close. 相似文献

11.

A Theory of Evidence-based method for assessing frequent patterns

Francisco Guil Roque Marín 《Expert systems with applications》2013,40(8):3121-3127

相似文献

12.

Mining spatial association rules in image databases 总被引：2，自引：0，他引：2

Anthony J.T. Lee Ruey-Wen Hong Wen-Kwang Tsao 《Information Sciences》2007,177(7):1593-1608

In this paper, we propose a novel spatial mining algorithm, called 9DLT-Miner, to mine the spatial association rules from an image database, where every image is represented by the 9DLT representation. The proposed method consists of two phases. First, we find all frequent patterns of length one. Next, we use frequent k-patterns (k ? 1) to generate all candidate (k + 1)-patterns. For each candidate pattern generated, we scan the database to count the pattern’s support and check if it is frequent. The steps in the second phase are repeated until no more frequent patterns can be found. Since our proposed algorithm prunes most of impossible candidates, it is more efficient than the Apriori algorithm. The experiment results show that 9DLT-Miner runs 2-5 times faster than the Apriori algorithm. 相似文献

13.

Mining frequent closed patterns in pointset databases

Anthony J.T. Lee Wen-Kwang Tsao Po-Yin Chen Ming-Chih Lin Shih-Hui Yang 《Information Systems》2010

In this paper, we proposed an efficient algorithm, called PCP-Miner (Pointset Closed Pattern Miner), for mining frequent closed patterns from a pointset database, where a pointset contains a set of points. Our proposed algorithm consists of two phases. First, we find all frequent patterns of length two in the database. Second, for each pattern found in the first phase, we recursively generate frequent closed patterns by a frequent pattern tree in a depth-first search manner. Since the PCP-Miner does not generate unnecessary candidates, it is more efficient and scalable than the modified Apriori, SASMiner and MaxGeo. The experimental results show that the PCP-Miner algorithm outperforms the comparing algorithms by more than one order of magnitude. 相似文献

14.

Constraint graph-based frequent pattern updating from temporal databases

Jason J. Jung 《Expert systems with applications》2012,39(3):3169-3173

There have been many kinds of association rule mining (ARM) algorithms, e.g., Apriori and FP-tree, to discover meaningful frequent patterns from a large dataset. Particularly, it is more difficult for such ARM algorithms to be applied for temporal databases which are continuously changing over time. Such algorithms are generally based on repeating time-consuming tasks, e.g., scanning databases. To deal with this problem, in this paper, we propose a constraint graph-based method for maintaining frequent patterns (FP) discovered from the temporal databases. Particularly, the constraint graph, which is represented as a set of constraint between two items, can be established by temporal persistency of the patterns. It means that some patterns can be used to build the constraint graph, when the patterns have been shown in a set of the FP. Two types of constraints can be generated by users and adaptation. Based on our scheme, we find that a large number of dataset has been efficiently reduced during mining process and the gathering information while updating. 相似文献

15.

Mining rooted ordered trees under subtree homeomorphism

Mostafa Haghir Chehreghani Maurice Bruynooghe 《Data mining and knowledge discovery》2016,30(5):1249-1272

Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant. 相似文献

16.

Mining Frequent Generalized Itemsets and Generalized Association Rules Without Redundancy 总被引：1，自引：0，他引：1

下载免费PDF全文

Daniel Kunkle 张冬晖 Gene Cooperman 《计算机科学技术学报》2008,23(1):77-02

This paper presents some new algorithms to efficiently mine max frequent generalized itemsets （g-itemsets） and essential generalized association rules （g-rules）. These are compact and general representations for all frequent patterns and all strong association rules in the generalized environment. Our results fill an important gap among algorithms for frequent patterns and association rules by combining two concepts. First, generalized itemsets employ a taxonomy of items, rather than a flat list of items. This produces more natural frequent itemsets and associations such as （meat, milk） instead of （beef, milk）, （chicken, milk）, etc. Second, compact representations of frequent itemsets and strong rules, whose result size is exponentially smaller, can solve a standard dilemma in mining patterns： with small threshold values for support and confidence, the user is overwhelmed by the extraordinary number of identified patterns and associations; but with large threshold values, some interesting patterns and associations fail to be identified. Our algorithms can also expand those max frequent g-itemsets and essential g-rules into the much larger set of ordinary frequent g-itemsets and strong g-rules. While that expansion is not recommended in most practical cases, we do so in order to present a comparison with existing algorithms that only handle ordinary frequent g-itemsets. In this case, the new algorithm is shown to be thousands, and in some cases millions, of the time faster than previous algorithms. Further, the new algorithm succeeds in analyzing deeper taxonomies, with the depths of seven or more. Experimental results for previous algorithms limited themselves to taxonomies with depth at most three or four. In each of the two problems, a straightforward lattice-based approach is briefly discussed and then a classificationbased algorithm is developed. In particular, the two classification-based algorithms are MFGI_class for mining max frequent g-itemsets and EGR_class for mining essential g-rules. The classification-based algorithms are featured with conceptual classification trees and dynamic generation and pruning algorithms. 相似文献

17.

Mining generalized temporal patterns based on fuzzy counting

Francisco Guil Antonio Bailón José A. Álvarez Roque Marín 《Expert systems with applications》2013,40(4):1296-1304

Event-based sequences are a kind of pattern based on temporal associations with two essential characteristics: they are syntactically simple and have a great expressive power. For this reason, event-based sequence mining is an interesting solution to the problem of knowledge discovery in dynamic domains, mainly characterized by a time-varying nature. The inter-transactional model has led to the design of algorithms aimed to obtain this sort of patterns from time-stamped datasets. These algorithms extend the well-known Apriori algorithm, by explicitly adding the temporal context where associations among frequent events occurs. This leads to the possibility of extracting a larger number of patterns with a potential interest in decision making. However, its usefulness is diminished in those datasets where the characteristics of variability and uncertainty are present, which is a common issue in real domains. This is due to the rigidity of the counting method, which uses an exact measure of distance between temporal events. As a solution, we propose a generalization of the temporal mining process, which implies a relaxation of the counting method including the concept of approximate temporal distance between events. In particular, in this paper we present an algorithm, called TSET^fuzzy-Miner, which incorporates a fuzzy-based counting technique in order to extract general, flexible, and practical temporal patterns taking into account the particular characteristics of real domains. 相似文献

18.

Parallel Mining of Neuronal Spike Streams on Graphics Processing Units

Yong Cao Debprakash Patnaik Sean Ponce Jeremy Archuleta Patrick Butler Wu-chun Feng Naren Ramakrishnan 《International journal of parallel programming》2012,40(6):605-632

Multi-electrode arrays (MEAs) provide dynamic and spatial perspectives into brain function by capturing the temporal behavior of spikes recorded from cultures and living tissue. Understanding the firing patterns of neurons implicit in these spike trains is crucial to gaining insight into cellular activity. We present a solution involving a massively parallel graphics processing unit (GPU) to mine spike train datasets. We focus on mining frequent episodes of firing patterns that capture coordinated events even in the presence of intervening background events. We present two algorithmic strategies—hybrid mining and two-pass elimination—to map the finite state machine-based counting algorithms onto GPUs. These strategies explore different computation-to-core mapping schemes and illustrate innovative parallel algorithm design patterns for temporal data mining. We also provide a multi-GPU mining framework, which exhibits additional performance enhancement. Together, these contributions move us towards a real-time solution to neuronal data mining. 相似文献

19.

A fast compound algorithm for mining generators,closed itemsets,and computing links between equivalence classes

Laszlo Szathmary Petko Valtchev Amedeo Napoli Robert Godin Alix Boc Vladimir Makarenkov 《Annals of Mathematics and Artificial Intelligence》2014,70(1-2):81-105

In pattern mining and association rule mining, there is a variety of algorithms for mining frequent closed itemsets (FCIs) and frequent generators (FGs), whereas a smaller part further involves the precedence relation between FCIs. The interplay of these three constructs and their joint computation have been studied within the formal concept analysis (FCA) field yet none of the proposed algorithms is scalable. In frequent pattern mining, at least one suite of efficient algorithms has been designed that exploits basically the same ideas and follows the same overall computational schema. Based on an in-depth analysis of the aforementioned interplay that is rooted in a fundamental duality from hypergraph theory, we propose a new schema that should enable for a more parsimonious computation. We exemplify the new schema in the design of Snow-Touch, a concrete FCI/FG/precedence miner that reuses an existing algorithm, Charm, for mining FCIs, and completes it with two original methods for mining FGs and precedence, respectively. The performance of Snow-Touch and of its closest competitor, Charm-L, were experimentally compared using a large variety of datasets. The outcome of the experimental study suggests that our method outperforms Charm-L on dense data while on sparse one the trend is reversed. Furthermore, we demonstrate the usefulness of our method and the new schema through an application to the analysis of a genome dataset. The initial results reported here confirm the capacity of the method to focus on significant associations. 相似文献

20.

Mining top-k frequent patterns over data streams sliding window

Hui Chen 《Journal of Intelligent Information Systems》2014,42(1):111-131

Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms. 相似文献