首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
一种高效的多层和概化关联规则挖掘方法   总被引:4,自引:1,他引:3  
毛宇星  陈彤兵  施伯乐 《软件学报》2011,22(12):2965-2980
通过对分类数据的深入研究,提出了一种高效的多层关联规则挖掘方法:首先,根据分类数据所在的领域知识构建基于领域知识的项相关性模型DICM(domain knowledge-based item correlation model),并通过该模型对分类数据的项进行层次聚类;然后,基于项的聚类结果对事务数据库进行约简划分;最后,将约简划分后的事务数据库映射至一种压缩的AFOPT树形结构,并通过遍历AFOPT树替代原事务数据库来挖掘频繁项集.由于缩小了事务数据库规模,并采用了压缩的AFOPT结构,所提出的方法有效地节省了算法的I/O时间,极大地提升了多层关联规则的挖掘效率.基于该方法,给出了一种自顶向下的多层关联规则挖掘算法TD-CBP-MLARM和一种自底向上的多层关联规则挖掘算法BU-CBP-MLARM.此外,还将该挖掘方法成功扩展至概化关联规则挖掘领域,提出了一种高效的概化关联规则挖掘算法CBP-GARM.通过大量人工随机生成数据的实验证明,所提出的多层和概化关联规则挖掘算法不仅可以确保频繁项集挖掘结果的正确性和完整性,还比现有同类最新算法具有更好的挖掘效率和扩展性.  相似文献   

2.
采用Apriori性质和位集合的技术进行海量遥感图像数据的挖掘,该算法只扫描一次事务数据库。节省了MBSA算法中的位集合的逻辑“与”操作,提高了效率,尤其针对海量的遥感图像数据。并将该算法应用到遥感图像关联规则挖掘中。  相似文献   

3.
数据挖掘中关联规则挖掘算法比较研究   总被引:27,自引:12,他引:15  
分析数据挖掘中关联规则挖掘算法的研究现状,提出关联规则新的价值衡量方法和关联规则挖掘今后进一步的研究方向。以核心Apfiofi算法为基点,运用文献查询和比较分析方法对典型的关联规则挖掘算法进行了综合研究:Apfiofi法即使进行了优化,一些固有的缺陷仍然无法克服,还需进一步研究;②今后的研究方向将是提高处理极大量数据和非结构化数据算法的效率、与OLAP相结合以及生成结果的可视化。  相似文献   

4.
连续数据的分割及关联规则发现   总被引:2,自引:1,他引:1  
关联规则的挖掘是一个重要的数据挖掘问题,目前的算法主要是研究基于支持-信任框架理论的关联规则挖掘,但是基于支持-信任框架理论的关联规则只适用于交易类型的数据库,然而现实的数据库中有许多连续数据,经典的关联规则就不适用了.该文介绍一种对连续数据集进行预处理过程,即对数据库中的数据项进行距离划分,并给出基于聚类方法的算法设计思想.  相似文献   

5.
随着现实待挖掘数据库规模不断增长,系统可使用的内存成为用FP-GROWTH算法进行关联规则挖掘的瓶颈.为了摆脱内存的束缚,对大规模数据库中的数据进行关联规则挖掘,基于磁盘的关联规则挖掘成为重要的研究方向.对此,改进原始的FP-TREE数据结构,提出了一种新颖的基于磁盘表的DTRFP-GROWTH(disk table resident FP-TREE growth)算法.该算法利用磁盘表存储FP-TREE,降低内存使用,在传统FP-GROWTH算法占用过多内存、挖掘工作无法进行时,以独特的磁盘表存储FP-TREE技术,减少内存使用,能够继续完成挖掘工作,适合空间性能优先的场合.不仅如此,该算法还将关联规则挖掘和关系型数据库整合,克服了基于文件系统相关算法效率较低、开发难度较大等问题.在真实数据集上进行了验证实验以及性能分析.实验结果表明,在内存空间有限的情况下,DTRFP-GROWTH算法是一种有效的基于磁盘的关联规则挖掘算法.  相似文献   

6.
王玮  陈恩红 《计算机工程》2000,26(9):17-18,29
关联规则的挖掘是一个重要的数据挖掘问题,目前的算法主要是研究基于支持-信任框架理论的关联规则挖掘,但是基于支持-信任框架理论的关联规则只适用于交易类型的数据库,然而现实的数据库中有许多连续数据,经典的关联规则就不适用了。该文介绍一种对连续数据集进行预处理过程,即对数据库中的数据项进行距离划分,并给出基于聚类方法的算法设计思想。  相似文献   

7.
为了有效提高关联规则挖掘算法处理数据库的效率,在研究基于矩阵的关联规则挖掘算法的基础上,提出了改进的关联规则挖掘算法DMApriori,并选取程序模拟超市购物产生的4个试验数据集,应用DMapriori算法对该数据集进行了关联规则挖掘;实验结果表明,该算法能平均提高关联规则挖掘时间20%;在计算数据库中的频繁项集时,通过有效裁剪布尔矩阵,使算法逐层扫描的数据量大大减少,并且对每个项集计数时,只扫描部分数据,提高了关联规则挖掘算法的性能。  相似文献   

8.
数据挖掘过程中只考虑数据项权重或者只考虑时态语义会导致挖掘结果不全面。针对该问题,对加权关联规则、时态关联规则和时态数据周期规律进行研究,将权值、K-支持期望和周期等概念引入到时态关联规则中,提出一种基于周期规律的加权时态关联规则挖掘算法。以某管理系统审计数据为例进行实验验证,结果表明该算法能够准确地挖掘出数据库中的加权时态关联规则,与加权关联规则算法相比,在时间复杂度相同的情况下能使关联规则的挖掘结果更加全面。  相似文献   

9.
崔建  李强  杨龙坡 《计算机科学》2011,38(4):216-220
为进一步解决对大型事务数据库进行关联规则挖掘时产生的CPU时间开销大和I/O操作频繁的问题,给出了一种基于垂直数据分布的改进关联规则挖掘算法,称为VARMLDb算法。该算法首先有效地把数据库分为内存可以满足要求的若干划分,然后结合有向无环图和垂直数据形式diffse、差集来存储和计算频繁项集,极大地减少了存储中间结果所需的内存大小,解决了传统垂直数据挖掘算法对稠密数据库挖掘效率低下的问题,使该算法可有效地适用于大型稠密数据库的关联规则挖掘。整个算法吸取CARMA算法的优势,只需扫描两次数据库便可完成挖掘过程。实验结果表明该算法是正确的,在大型稠密数据库中,VARMLDb算法具有较高的执行效率。  相似文献   

10.
针对目前关联规则挖掘的数据集不断增大,而很多抽样算法精度不高还要解决一系列NP难问题等情况。在分析利用频繁1项集进行抽样处理的基础上,提出了高精度的基于频繁n项集平均划分的关联规则挖掘算法——EHAC算法。理论和实验都表明,EHAC能够提高数据挖掘精度,在数据平均划分的同时,尽量保证频繁n项集能够平均划分,减少了数据库扫描次数,一定程度上缩减了数据库规模。  相似文献   

11.
Association rule mining, originally proposed for market basket data, has potential applications in many areas. Spatial data, such as remote sensed imagery (RSI) data, is one of the promising application areas. Extracting interesting patterns and rules from spatial data sets, composed of images and associated ground data, can be of importance in precision agriculture, resource discovery, and other areas. However, in most cases, the sizes of the spatial data sets are too large to be mined in a reasonable amount of time using existing algorithms. In this paper, we propose an efficient approach to derive association rules from spatial data using Peano Count Tree (P-tree) structure. P-tree structure provides a lossless and compressed representation of spatial data. Based on P-trees, an efficient association rule mining algorithm PARM with fast support calculation and significant pruning techniques is introduced to improve the efficiency of the rule mining process. The P-tree based Association Rule Mining (PARM) algorithm is implemented and compared with FP-growth and Apriori algorithms. Experimental results showed that our algorithm is superior for association rule mining on RSI spatial data.   相似文献   

12.
The discovery of association rules is a very efficient data mining technique that is especially suitable for large amounts of categorical data. This paper shows how the discovery of association rules can be of benefit for numeric data as well. Based on a review of previous approaches we introduce Q2, a faster algorithm for the discovery of multi-dimensional association rules over ordinal data. We experimentally compare the new algorithm with the previous approach, obtaining performance improvements of more than an order of magnitude on supermarket data. In addition, a new absolute measure for the interestingness of quantitative association rules is introduced. It is based on the view that quantitative association rules have to be interpreted with respect to their Boolean generalizations. This measure has two major benefits compared to the previously used relative interestingness measure; first, it speeds up rule extraction and evaluation and second, it is easier to interpret for a user. Finally we introduce a rule browser which supports the exploration of ordinal data with quantitative association rules.  相似文献   

13.
基于数据融合的知识发现方法在网络管理中的应用   总被引:2,自引:0,他引:2  
提出用于网络管理的基于数据融合的知识发现系统框架,研究数据融合技术在知识发现的数据准备和预处理阶段的应用,研究关联规则在表达网络管理知识方面的适用性并针对网络管理数据时序性的特点,引入情景规则来表示期望发掘的知识,指出网络故障管理中关联规则和情景规则的挖掘算法以及知识增量式更新的算法,并简介了原型系统的实现方法。  相似文献   

14.
Generalized multidimensional association rules   总被引:2,自引:0,他引:2       下载免费PDF全文
The problem of association rule mining has gained considerable prominence in the data mining community for its use as an important tool of knwledge discovery from large-scale databases.Ande there has been a sput of research activities around this problem.Traditional association rule mining is limited to intra-transaction.Only recently the concept on N-dimensional inter-transaction association rule(NDITAR)was proposed by H.J.Lu.This paper modifies and extends Lu‘s definition of NDITAR based on the analysis of its limitations,and the generalized multidimensional association rule(GMDAR)is subsequently introduced,which is ore general,flexible and reasonable than NDITAR.  相似文献   

15.
分布式环境下挖掘约束性关联规则的算法研究   总被引:2,自引:0,他引:2  
关联规则是数据挖掘的重要研究内容。基于约束的关联规则挖掘可以促进交互式探查与分析。该文主要研究了分布式环境中挖掘约束性关联规则的问题。在并行关联规则挖掘算法CD和约束性关联规则挖掘算法Direct的基础上,提出了一种新的分布式挖掘约束性关联规则算法DMA_IC。该算法对于解决分布式挖掘约束性关联规则的问题是十分有效的。同时,文章还对DMA_IC算法的通信性能进行了讨论。  相似文献   

16.
Association rule mining is an effective data mining technique which has been used widely in health informatics research right from its introduction. Since health informatics has received a lot of attention from researchers in last decade, and it has developed various sub-domains, so it is interesting as well as essential to review state of the art health informatics research. As knowledge discovery researchers and practitioners have applied an array of data mining techniques for knowledge extraction from health data, so the application of association rule mining techniques to health informatics domain has been focused and studied in detail in this survey. Through critical analysis of applications of association rule mining literature for health informatics from 2005 to 2014, it has been explored that, instead of the more efficient alternative approaches, the Apriori algorithm is still a widely used frequent itemset generation technique for application of association rule mining for health informatics. Moreover, other limitations related to applications of association rule mining for health informatics have also been identified and recommendations have been made to mitigate those limitations. Furthermore, the algorithms and tools utilized for application of association rule mining have also been identified, conclusions have been drawn from the literature surveyed, and future research directions have been presented.  相似文献   

17.
Simple association rules (SAR) and the SAR-based rule discovery   总被引:13,自引:0,他引:13  
Association rule mining is one of the most important fields in data mining and knowledge discovery in databases. Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. Instead, this paper concentrates on a smaller set of rules, namely, a set of simple association rules each with its consequent containing only a single attribute. Such a rule set can be used to derive all other association rules, meaning that the original rule set based on conventional algorithms can be ‘recovered’ from the simple rules without any information loss. The number of simple rules is much less than the number of all rules. Moreover, corresponding algorithms are developed such that certain forms of rules (e.g. ‘P?’ or ‘?Q’) can be generated in a more efficient manner based on simple rules.  相似文献   

18.
Discovery of frequent DATALOG patterns   总被引:19,自引:0,他引:19  
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings.We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.  相似文献   

19.
姜伟 《微计算机应用》2007,28(5):549-551
提出了一个基于联机分析技术(OLAP)的教学评价与知识发现,给出了由学生,知识点和类别等构成的六个维度的数据立方体以及利用OLAP技术和关联规则对该数据立方体进行数据挖掘的解决方案。利用上述方法对学生的考试系统进行挖掘,得出有用的结论,从而指导学校的教学工作。  相似文献   

20.
基于属性链表的关联规则格的渐进式构造算法   总被引:4,自引:0,他引:4  
作为数据挖掘核心任务之一的关联规则发现已经得到了广泛的研究。而由二元关系导出的概念格则是一种非常有用的形式化工具,非常适于发现数据中潜在的概念。分析了概念格与关联规则提取之间的关系,根据需要对格结构进行了相应的修改,提出了关联规则格的概念,并提出属性链表这种数据结构,基于这种链表提出了关联规则格的渐进式构造算法。通过对算法进行分析,得出了比Godin算法更好的时间效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号