共查询到20条相似文献,搜索用时 31 毫秒
1.
Conventional algorithms for mining association rules operate in a combination of smaller large itemsets. This paper presents a new efficient which combines both the cluster concept and decomposition of larger candidate itemsets, while proceeds from mining the maximal large itemsets down to large 1-itemsets, named cluster-decomposition association rule (CDAR). First, the CDAR method creates some clusters by reading the database only once, and then clustering the transaction records to the kth cluster, where the length of a record is k. Then, the large k-itemsets are generated by contrasts with the kth cluster only, unlike the combination concept that contrasts with the entire database. Experiments with real-life databases show that CDAR outperforms Apriori, a well-known and widely used association rule. 相似文献
2.
目前已经提出了许多用于高效地发现大规模数据库中的关联规则的算法,但都是对关联规则中满足最小支持度的频繁项集的研究,没有对频繁项集中如何高效地计算得到满足最小置信度的关联规则进行研究.针对这种情况,提出了一种高效关联规则的挖掘算法EA,解决了在挖掘关联规则过程中如何高效挖掘满足最小置信度的关联规则问题. 相似文献
3.
一种高效的关联规则增量更新算法 总被引:3,自引:0,他引:3
对挖掘关联规则中FUP算法的关键思想以及性能进行了研究,提出了改进的FUP算法SFUP。该算法充分利用原有挖掘结果中候选频繁项集的支持数,能有效减少对数据库的重复扫描次数,并通过实验对这两种算法进行比较,结果充分说明了SFUP算法的效率要明显优于FUP算法。 相似文献
4.
EDUA: An efficient algorithm for dynamic database mining 总被引:1,自引:0,他引:1
Maintaining frequent itemsets (patterns) is one of the most important issues faced by the data mining community. While many algorithms for pattern discovery have been developed, relatively little work has been reported on mining dynamic databases, a major area of application in this field. In this paper, a new algorithm, namely the Efficient Dynamic Database Updating Algorithm (EDUA), is designed for mining dynamic databases. It works well when data deletion is carried out in any subset of a database that is partitioned according to the arrival time of the data. A pruning technique is proposed for improving the efficiency of the EDUA algorithm. Extensive experiments are conducted to evaluate the proposed approach and it is demonstrated that the EDUA is efficient. 相似文献
5.
Progressive partition miner: an efficient algorithm for mining general temporal association rules 总被引:1,自引:0,他引:1
Chang-Hung Lee Ming-Syan Chen Cheng-Ru Lin 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(4):1004-1017
We explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items of which each item contains an individual exhibition period. The current model of association rule mining is not able to handle the publication database due to the following fundamental problems, i.e., 1) lack of consideration of the exhibition period of each individual item and 2) lack of an equitable support counting basis for each item. To remedy this, we propose an innovative algorithm progressive-partition-miner (abbreviated as PPM) to discover general temporal association rules in a publication database. The basic idea of PPM is to first partition the publication database in light of exhibition periods of items and then progressively accumulate the occurrence count of each candidate 2-itemset based on the intrinsic partitioning characteristics. Algorithm PPM is also designed to employ a filtering threshold in each partition to early prune out those cumulatively infrequent 2-itemsets. The feature that the number of candidate 2-itemsets generated by PPM is very close to the number of frequent 2-itemsets allows us to employ the scan reduction technique to effectively reduce the number of database scans. Explicitly, the execution time of PPM is, in orders of magnitude, smaller than those required by other competitive schemes that are directly extended from existing methods. The correctness of PPM is proven and some of its theoretical properties are derived. Sensitivity analysis of various parameters is conducted to provide many insights into Algorithm PPM. 相似文献
6.
Bilal Alataş Erhan Akin 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2006,10(3):230-237
In this paper, a genetic algorithm (GA) is proposed as a search strategy for not only positive but also negative quantitative
association rule (AR) mining within databases. Contrary to the methods used as usual, ARs are directly mined without generating
frequent itemsets. The proposed GA performs a database-independent approach that does not rely upon the minimum support and
the minimum confidence thresholds that are hard to determine for each database. Instead of randomly generated initial population,
uniform population that forces the initial population to be not far away from the solutions and distributes it in the feasible
region uniformly is used. An adaptive mutation probability, a new operator called uniform operator that ensures the genetic
diversity, and an efficient adjusted fitness function are used for mining all interesting ARs from the last population in
only single run of GA. The efficiency of the proposed GA is validated upon synthetic and real databases. 相似文献
7.
Thu-Lan Dam Kenli Li Philippe Fournier-Viger Quang-Huy Duong 《Applied Intelligence》2016,45(1):96-111
Mining top-rank-k frequent patterns is a popular data mining task, which consists of discovering the patterns in a transaction database that belong to the k first ranks in terms of support. Although, several algorithms have been proposed for this task, it remains computationally expensive. To address this issue, this paper proposes a novel algorithm named BTK. It relies on a novel tree structure named TB-tree to store crucial information about frequent patterns. Moreover, BTK employs a new B-list structure to store information about patterns, and relies on subsume indexes to reduce the search space and speed up the discovery of top-rank-k frequent patterns. BTK also uses an early pruning strategy and an effective threshold raising mechanism. Additionally, BTK introduces two efficient procedures for respectively generating subsume indexes and intersecting B-lists. Extensive experiments were conducted on several datasets to evaluate the efficiency of the proposed algorithm. Results show that BTK is highly efficient and competitive. 相似文献
8.
In this paper, we propose an efficient method for mining all frequent inter-transaction patterns. The method consists of two phases. First, we devise two data structures: a dat-list, which stores the item information used to find frequent inter-transaction patterns; and an ITP-tree, which stores the discovered frequent inter-transaction patterns. In the second phase, we apply an algorithm, called ITP-Miner (Inter-Transaction Patterns Miner), to mine all frequent inter-transaction patterns. By using the ITP-tree, the algorithm requires only one database scan and can localize joining, pruning, and support counting to a small number of dat-lists. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. 相似文献
9.
《Knowledge》2007,20(4):329-335
Mining frequent itemsets in transaction databases, time-series databases and many other kinds of databases is an important task and has been studied popularly in data mining research. The problem of mining frequent itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying those itemsets that meet the frequent itemset requirement within this candidate set. Most of the previous research mainly focuses on pruning to reduce the candidate itemsets amounts and the times of scanning databases. However, many algorithms adopt an Apriori-like candidate itemsets generation and support count approach that is the most time-wasted process. To address this issue, the paper proposes an effective algorithm named as BitTableFI. In the algorithm, a special data structure BitTable is used horizontally and vertically to compress database for quick candidate itemsets generation and support count, respectively. The algorithm can also be used in many Apriori-like algorithms to improve the performance. Experiments with both synthetic and real databases show that BitTableFI outperforms Apriori and CBAR which uses ClusterTable for quick support count. 相似文献
10.
Duy-Tai Dinh Bac Le Philippe Fournier-Viger Van-Nam Huynh 《Applied Intelligence》2018,48(12):4694-4714
A periodic high-utility sequential pattern (PHUSP) is a pattern that not only yields a high-utility (e.g. high profit) but also appears regularly in a sequence database. Finding PHUSPs is useful for several applications such as market basket analysis, where it can reveal recurring and profitable customer behavior. Although discovering PHUSPs is desirable, it is computationally difficult. To discover PHUSPs efficiently, this paper proposes a structure for periodic high-utility sequential pattern mining (PHUSPM) named PUSP. Furthermore, to reduce the search space and speed up PHUSPM, a pruning strategy is developed. This results in an efficient algorithm called periodic high-utility sequential pattern optimal miner (PUSOM). An experimental evaluation was performed on both synthetic and real-life datasets to compare the performance of PUSOM with state-of-the-art PHUSPM algorithms in terms of execution time, memory usage and scalability. Experimental results show that the PUSOM algorithm can efficiently discover the complete set of PHUSPs. Moreover, it outperforms the other four algorithms as the former can prune many unpromising patterns using its designed structure and pruning strategy. 相似文献
11.
The purpose of mining frequent itemsets is to identify the items in groups that always appear together and exceed the user-specified threshold of a transaction database. However, numerous frequent itemsets may exist in a transaction database, hindering decision making. Recently, the mining of frequent closed itemsets has become a major research issue because sets of frequent closed itemsets are condensed yet complete representations of frequent itemsets. Therefore, all frequent itemsets can be derived from a group of frequent closed itemsets. Nonetheless, the number of transactions in a transaction database can increase rapidly in a short time period, and a number of the transactions may be outdated. Thus, frequent closed itemsets may be changed with the addition of new transactions or the deletion of old transactions from the transaction database. Updating previously closed itemsets when transactions are added or removed from the transaction database is challenging. This study proposes an efficient algorithm for incrementally mining frequent closed itemsets without scanning the original database. The proposed algorithm updates closed itemsets by performing several operations on the previously closed itemsets and added/deleted transactions without searching the previously closed itemsets. The experimental results show that the proposed algorithm significantly outperforms previous methods, which require a substantial length of time to search previously closed itemsets. 相似文献
12.
高效的关联规则挖掘算法 总被引:2,自引:0,他引:2
针对Apriori算法多次扫描数据库且生成的候选项集数量大的缺陷,提出了一种数据库优化策略,并结合修剪频繁集和连接优化策略,得到一种新的关联规则挖掘算法-NApriori算法.该算法减小了数据库的规模以及候选项集的数目,避免了连接过程中相同项目的重复比较.实验表明此方法比Apriori算法有更好的性能. 相似文献
13.
传统的关联规则挖掘是单向的,不能确定相互依赖的规则,找到的规则不一定是有意义的,甚至是错误的。鉴于此,本文在分析的基础上,提出双向关联规则挖掘算法。并根据其相关性找出对我们有意义的规则。 相似文献
14.
在事务数据集中发现项目间的关联规则是数据挖掘的一个经典问题,但传统的关联规则挖掘方法对于大事务数据集而言,执行效率相对较低。已经有研究表明,采样技术能有效地改善挖掘效率。在分析现有采样方法的基础上,提出了一种新的基于采样的高效关联规则挖掘算法ESMA。该算法采用了更加有效的双向采样策略。通过实验分析表明,该算法明显地加快了大事务数据库中采样的速度,从而降低了CPU时间,而且具有很好的可扩展性。 相似文献
15.
《Expert systems with applications》2014,41(9):4309-4321
Detecting communities in social networks represents a significant task in understanding the structures and functions of networks. Several methods are developed to detect disjoint partitions. However, in real graphs vertices are often shared between communities, hence the notion of overlap. The study of this case has attracted, recently, an increasing attention and many algorithms have been designed to solve it. In this paper, we propose an overlapping communities detecting algorithm called DOCNet (Detecting overlapping communities in Networks). The main strategy of this algorithm is to find an initial core and add suitable nodes to expand it until a stopping criterion is met. Experimental results on real-world social networks and computer-generated artificial graphs demonstrate that DOCNet is efficient and highly reliable for detecting overlapping groups, compared with four newly known proposals. 相似文献
16.
17.
18.
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks demonstrate stable run time regardless of the number of computers. Theoretical analysis reveals a tighter bound on error probability than the one shown in the corresponding sequential algorithm. As a result of this tighter bound and by utilizing the combined memory of several computers, the algorithm generates far fewer candidates than comparable sequential algorithms—the same order of magnitude as the optimum. 相似文献
19.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions. 相似文献
20.
Mining frequent weighted itemsets (FWIs) from weighted-item transaction databases has recently received research interest. In real-world applications, sparse weighted-item transaction databases (SWITDs) are common. For example, supermarkets have many items, but each transaction has a small number of items. In this paper, we propose an interval word segment (IWS) structure to store and process tidsets for enhancing the effectiveness of mining FWIs from SWITDs. The IWS structure allows the intersection of tidsets between two itemsets to be performed very fast. A map array is proposed for storing a 1-bit index for words. From the map array, 1-bits are mapped to create the tidset of an itemset for faster calculation of the weighted support of itemsets. Experimental results for a number of SWITDs show that the method based on IWS structure outperforms existing methods. 相似文献