首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Association rule mining is an important data analysis method for the discovery of associations within data. There have been many studies focused on finding fuzzy association rules from transaction databases. Unfortunately, in the real world, one may have available relatively infrequent data, as well as frequent data. From infrequent data, we can find a set of rare itemsets that will be useful for teachers to find out which students need extra help in learning. While the previous association rules discovery techniques are able to discover some rules based on frequency, this is insufficient to determine the importance of a rule composed of frequency-based data items. To remedy this problem, we develop a new algorithm based on the Apriori approach to mine fuzzy specific rare itemsets from quantitative data. Finally, fuzzy association rules can be generated from these fuzzy specific rare itemsets. The patterns are useful to discover learning problems. Experimental results show that the proposed approach is able to discover interesting and valuable patterns from the survey data.  相似文献   

2.
关联规则中频繁项集数量庞大的问题是关联规则可视化要解决的一个主要问题,本文介绍了一种基于平行坐标系和项目分类树的频繁项集和关联规则可视化方法。首先,在频繁项集中设置显示边界,利用频繁项集的闭包特性,实现对大的频繁项集的剪枝;然后,结合overview+detail的视点控制技术,通过交互,由用户选择感兴趣的某一节点上的频繁项集,在de-tail窗口中详细显示,从而实现人机交互的频繁项集和关联规则可视化。  相似文献   

3.
A central part of many algorithms for mining association rules in large data sets is a procedure that is to find so called frequent itemsets. The frequent itemsets are very large due to transactions data increasing. This paper proposes a new approach to find frequent itemsets employing rough set theory that can extract association rules for each homogenou.s cluster of transaction data records and relationships between different clusters. This paper conducts an algorithm to reduce a large number of itemsets to find valid association rules.  相似文献   

4.
基于幂集的关联规则挖掘算法研究   总被引:13,自引:2,他引:13  
首次提出了利用幂集作为挖掘关联规则的工具,给出了基于幂集的关联规则挖掘算法。该算法有效解决了传统算法中需对数据库多次扫描的不足,实现了对数据库一次扫描就可挖掘出所有频繁集的功能。  相似文献   

5.
负关联规则增量更新算法   总被引:1,自引:1,他引:0       下载免费PDF全文
讨论负关联规则的更新问题。与正关联规则增量更新不同,负关联规则不仅存在于频繁项集中,更多存在于非频繁项集中。针对该问题提出一种负关联规则增量更新算法NIUA,利用改进的Apriori算法以及集合的性质挖掘出频繁、非频繁项集和负关联规则。实验结果表明,该算法是可取的。  相似文献   

6.
Generating a Condensed Representation for Association Rules   总被引:1,自引:0,他引:1  
Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the non-redundant association rules having minimal antecedent and maximal consequent, called min-max association rules. We think that these rules are the most relevant since they are the most general non-redundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as Apriori for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset.  相似文献   

7.
One fundamental problem for visualizing frequent itemsets and association rules is how to present a long border of frequent itemsets in an itemset lattice. Another problem comes from the lack of an effective visual metaphor to represent many-to-many relationships. This work proposes an approach for visualizing frequent itemsets and many-to-many association rules by a novel use of parallel coordinates. An association rule is visualized by connecting items in the rule, one item on each parallel coordinate, with continuous polynomial curves. In the presence of item taxonomy, each coordinate can be used to visualize an item taxonomy tree which can be expanded or shrunk by user interaction. This user interaction introduces a border, which separates displayable itemsets from nondisplayable ones, in the generalized itemset lattice. Only those itemsets that are both frequent and displayable are considered to be displayed. This approach of visualizing frequent itemsets and association rules has the following features: 1) It is capable of visualizing many-to-many rules and itemsets with many items. 2) It is capable of visualizing a large number of itemsets or rules by displaying only those ones whose items are selected by the user. 3) The closure properties of frequent itemsets and association rules are inherently supported such that the implied ones are not displayed. Usefulness of this approach is demonstrated through examples.  相似文献   

8.
一种新的动态频繁项集挖掘方法   总被引:1,自引:0,他引:1  
频繁项集挖掘是关联规则挖掘的重要步骤。在数据动态变化的环境下进行关联规则挖掘具有重要的现实意义。提出一种动态频繁项集挖掘算法,该算法建立在前一阶段挖掘的基础上,能避免过多地扫描数据库而影响挖掘性能,在最后生成全局频繁项集时,不需要全程扫描数据库,根据之前挖掘结果有选择地扫描相关的事务子集。实验表明,该算法挖掘性能远远优于Apriori算法,能有效地实现在数据动态变化环境下的挖掘频繁项集。  相似文献   

9.
一种有效的挖掘关联规则更新方法   总被引:1,自引:0,他引:1  
王新 《计算机应用》2005,25(6):1360-1361,1372
在挖掘关联规则过程中,用户往往需要多次调整(增加或减少)最小支持度,才能获得有用的关联规则。给出一个利用已存信息有效产生新候选项目集的PSI算法,结果表明每次扫描数据库时能有效地减少候选项目集的数。  相似文献   

10.
In order to efficiently trace the changes of association rules over an online data stream, this paper proposes a method of generating association rules directly over the changing set of currently frequent itemsets. While all of the currently frequent itemsets in an online data stream are monitored by the estDec method, all the association rules of every frequent itemset in the prefix tree of the estDec method are generated by the proposed method in this paper. For this purpose, a traversal stack is introduced to efficiently enumerate all association rules in the prefix tree. This online implementation can avoid the drawbacks of the conventional two-step approach. In addition, the prefix tree itself can be utilized as an index structure for finding the current support of the antecedent of an association rule. Finally, the performance of the proposed method is analyzed by a series of experiments to identify its various characteristics.  相似文献   

11.
刘松 《微计算机应用》2006,27(5):566-569
针对关联规则挖掘问题提出一种新的算法,探讨商品与利润间的关系,称为权重式多重支持度关联规则挖掘算法。此算法可针对不同利润的商品定出不同的支持度阈值,由此产生的关联规则,可以解决高单价但交易次数稀少的商品不易被挖掘的问题。  相似文献   

12.
Most methods for mining association rules from tabular data mine simple rules which only use the equality operator “=” in their items. For quantitative attributes, approaches tend to discretize domain values by partitioning them into intervals. Limiting the operator only to “=” results in many interesting frequent patterns that may not be identified. It is obvious that where there is an order between objects, operators such as greater than or less than a given value are as important as the equality operator. This motivates us to extend association rules, from the simple equality operator, to a more general set of operators. We address the problem of mining general association rules in tabular data where rules can have all operators {?, >, ≠, =} in their antecedent part. The proposed algorithm, mining general rules (MGR), is applicable to datasets with discrete-ordered attributes and on quantitative discretized attributes. The proposed algorithm stores candidate general itemsets in a tree structure in such a way that supports of complex itemsets can be recursively computed from supports of simpler itemsets. The algorithm is shown to have benefits in terms of time complexity, memory management and has good potential for parallelization.  相似文献   

13.
We examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying, within this candidate set, these itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of a candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. An extensive simulation study is conducted to evaluate performance of the proposed algorithm  相似文献   

14.
二进制的交叉挖掘关联规则研究   总被引:1,自引:1,他引:0       下载免费PDF全文
为了易于产生候选频繁项目集和计算项目集的支持数,提出了基于二进制的关联规则挖掘算法,但在搜索候选频繁项目集时仍从集合论出发,沿用传统搜索超集或子集的方法,在一定程度上效率受到了限制;为此提出了一种基于二进制的交叉挖掘关联规则算法,通过数值的递增和递减交叉方式自动产生候选频繁项集,缩短了候选频繁项的搜索空间,并在计算支持数时通过数字特征减少了扫描事务的个数,算法的效率得到了明显提高;该实验结果表明:与现有的二进制关联规则挖掘算法相比,算法是快速而有效的。  相似文献   

15.
针对无结构P2P网络中稀有资源搜索成功率低、搜索代价高的问题,提出2种基于网络覆盖的稀有资源搜索策略:RSR和FRSR RSR在随机漫步的基础上通过考虑邻居节点的热度改进请求转发方式,FRSR通过结合洪泛搜索改进随机漫步转发策略。实验结果表明,RSR搜索稀有资源的时间比普通随机漫步减少了22.9%,平均搜索成功率提高了26.2%,通信开销降低了22.8%,FRSR比随机转发方式减少了15.4%的搜索时间,提高了14.2%的搜索成功率。  相似文献   

16.
王淞  黄浩  余果  梁楠  王黎维  孙月明 《软件学报》2016,27(9):2320-2331
稀有类检测的目标是为无类别标签的数据集中的每个类,特别是仅含少量数据样本的稀有类,寻找到至少一个数据样本以证明数据集中存在这些类.该技术在金融欺诈检测及网络入侵检测等现实问题中具有广泛的应用场景.但是,现有的稀有类检测算法往往存在以下问题:(1)时间复杂度比较高;或(2)对原始数据集需要一定的先验知识,如数据集中各类数据样本所占比例等.提出了一种基于k邻近图的无先验快速稀有类检测算法KRED,通过利用稀有类数据样本在小范围内紧密分布所造成的与周边数据分布的不一致性来定位稀有类.为此,KRED将给定数据集转化为k邻近图,并计算图中各顶点入度和边长的变化.最后,将以上变化最大的顶点对应的数据样本作为稀有类的候选样本.实验结果表明:KRED有效提高了发现数据集中各个类的效率,明显缩短了算法运行所需时间.  相似文献   

17.
Mining Informative Rule Set for Prediction   总被引:2,自引:0,他引:2  
Mining transaction databases for association rules usually generates a large number of rules, most of which are unnecessary when used for subsequent prediction. In this paper we define a rule set for a given transaction database that is much smaller than the association rule set but makes the same predictions as the association rule set by the confidence priority. We call this rule set informative rule set. The informative rule set is not constrained to particular target items; and it is smaller than the non-redundant association rule set. We characterise relationships between the informative rule set and non-redundant association rule set. We present an algorithm to directly generate the informative rule set without generating all frequent itemsets first that accesses the database less frequently than other direct methods. We show experimentally that the informative rule set is much smaller and can be generated more efficiently than both the association rule set and non-redundant association rule set.  相似文献   

18.
关联规则挖掘是近年来数据挖掘领域中一个相当活跃的领域,频繁项集挖掘是关联规则挖掘中最重要的任务。最大频繁项集的规模远远小于频繁项集的规模,通过最大频繁项集可以导出所有的频繁项集,因此进行了很多专门挖掘最大频繁项集的研究。给出了关联规则和相关术语的基本概念,对最大频繁项集挖掘算法作了分析与评价,便于研究者对已有的算法进行改进,提出具有更好性能的新算法。  相似文献   

19.
Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set.  相似文献   

20.
基于频繁集的图像特征抽取   总被引:1,自引:1,他引:0       下载免费PDF全文
在图像分析领域,已有不少研究探讨了通过构建图像相邻像素之间的事务数据集,对图像纹理关联规则进行挖掘,但纹理关联规则仅存留最大项的频繁项集会使得很多信息丢失。为此提出了基于频繁项集的图像特征抽取方法,该方法首先基于项集的频繁度及空间分布筛选候选频繁项集,再定义每一个频繁项集的空间表达能力值构建特征集。在遥感图像上进行仿真测试,针对EM算法对初始设置比较敏感的特点,采用了对同一特征集指定不同聚类数目并比较对数似然值确定最终聚类结果的方法。实验结果表明,提出的频繁集对图像特征具有较好的表达。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号