首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
Market basket analysis is one of the typical applications in mining association rules. The valuable information discovered from data mining can be used to support decision making. Generally, support and confidence (objective) measures are used to evaluate the interestingness of association rules. However, in some cases, by using these two measures, the discovered rules may be not profitable and not actionable (not interesting) to enterprises. Therefore, how to discover the patterns by considering both objective measures (e.g. probability) and subjective measures (e.g. profit) is a challenge in data mining, particularly in marketing applications. This paper focuses on pattern evaluation in the process of knowledge discovery by using the concept of profit mining. Data Envelopment Analysis is utilized to calculate the efficiency of discovered association rules with multiple objective and subjective measures. After evaluating the efficiency of association rules, they are categorized into two classes, relatively efficient (interesting) and relatively inefficient (uninteresting). To classify these two classes, Decision Tree (DT)‐based classifier is built by using the attributes of association rules. The DT classifier can be used to find out the characteristics of interesting association rules, and to classify the unknown (new) association rules.  相似文献   

2.
基于"新颖度"的关联挖掘算法   总被引:2,自引:2,他引:0  
关联挖掘的目的是从大量数据中发现对用户有用、新颖、重要的关联规则.传统的关联挖掘算法会产生大量对用户而言显而易见的平凡规则,使那些真正对用户有用的新颖规则被淹没,而一些针对新颖性的改进算法往往又存在先验知识表达复杂且工作量极大的问题.在本文中,我们运用简单的分类树,引入"新颖度"的概念,对Apriori算法进行改进,得到了基于"新颖度"的关联挖掘算法,此算法既充分考虑了挖掘过程中得新颖性问题,又克服了先验知识表达过于复杂的困难.  相似文献   

3.

Association rules mining is a popular data mining modeling tool. It discovers interesting associations or correlation relationships among a large set of data items, showing attribute values that occur frequently together in a given dataset. Despite their great potential benefit, current association rules modeling tools are far from optimal. This article studies how visualization techniques can be applied to facilitate the association rules modeling process, particularly what visualization elements should be incorporated and how they can be displayed. Original designs for visualization of rules, integration of data and rule visualizations, and visualization of rule derivation process for supporting interactive visual association rules modeling are proposed in this research. Experimental results indicated that, compared to an automatic association rules modeling process, the proposed interactive visual association rules modeling can significantly improve the effectiveness of modeling, enhance understanding of the applied algorithm, and bring users greater satisfaction with the task. The proposed integration of data and rule visualizations can significantly facilitate understanding rules compared to their nonintegrated counterpart.  相似文献   

4.
挖掘所关注规则的多策略方法研究   总被引:20,自引:1,他引:19  
通过数据挖掘,从大型数据库中发现了大量规则,如何选取所关注的规则,是知识发现的重要研究内容。该文研究了利用领域知识对规则的主观关注程度进行度量的方法,给出了一个能够度量规则的简洁性和新奇性的客观关注程度的计算函数,提出了选取用户关注的规则的多策略方法。  相似文献   

5.
The motivation for regional association rule mining and scoping is driven by the facts that global statistics seldom provide useful insight and that most relationships in spatial datasets are geographically regional, rather than global. Furthermore, when using traditional association rule mining, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support. In this paper, we systematically study this problem and address the unique challenges of regional association mining and scoping: (1) region discovery: how to identify interesting regions from which novel and useful regional association rules can be extracted; (2) regional association rule scoping: how to determine the scope of regional association rules. We investigate the duality between regional association rules and regions where the associations are valid: interesting regions are identified to seek novel regional patterns, and a regional pattern has a scope of a set of regions in which the pattern is valid. In particular, we present a reward-based region discovery framework that employs a divisive grid-based supervised clustering for region discovery. We evaluate our approach in a real-world case study to identify spatial risk patterns from arsenic in the Texas water supply. Our experimental results confirm and validate research results in the study of arsenic contamination, and our work leads to the discovery of novel findings to be further explored by domain scientists.  相似文献   

6.
An RNA pseudoknot consists of nonnested double-stranded stems connected by single-stranded loops. There is increasing recognition that RNA pseudoknots are one of the most prevalent RNA structures and fulfill a diverse set of biological roles within cells, and there is an expanding rate of studies into RNA pseudoknotted structures as well as increasing allocation of function. These not only produce valuable structural data but also facilitate an understanding of structural and functional characteristics in RNA molecules. PseudoBase is a database providing structural, functional, and sequence data related to RNA pseudoknots. To capture the features of RNA pseudoknots, we present a novel framework using quantitative association rule mining to analyze the pseudoknot data. The derived rules are classified into specified association groups regarding structure, function, and category of RNA pseudoknots. The discovered association rules assist biologists in filtering out significant knowledge of structure-function and structure-category relationships. A brief biological interpretation to the relationships is presented, and their potential correlations with each other are highlighted.  相似文献   

7.
Data-mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values, however, transactions with quantitative values are commonly seen in real-world applications. This paper thus proposes a new data-mining algorithm for extracting interesting knowledge from transactions stored as quantitative values. The proposed algorithm integrates fuzzy set concepts and the apriori mining algorithm to find interesting fuzzy association rules in given transaction data sets. Experiments with student grades at I-Shou University were also made to verify the performance of the proposed algorithm.  相似文献   

8.
The quality of discovered association rules is commonly evaluated by interestingness measures (commonly support and confidence) with the purpose of supplying indicators to the user in the understanding and use of the new discovered knowledge. Low-quality datasets have a very bad impact over the quality of the discovered association rules, and one might legitimately wonder if a so-called “interesting” rule noted LHSRHS is meaningful when 30% of the LHS data are not up-to-date anymore, 20% of the RHS data are not accurate, and 15% of the LHS data come from a data source that is well-known for its bad credibility. This paper presents an overview of data quality characterization and management techniques that can be advantageously employed for improving the quality awareness of the knowledge discovery and data mining processes. We propose to integrate data quality indicators for quality aware association rule mining. We propose a cost-based probabilistic model for selecting legitimately interesting rules. Experiments on the challenging KDD-Cup-98 datasets show that variations on data quality have a great impact on the cost and quality of discovered association rules and confirm our approach for the integrated management of data quality indicators into the KDD process that ensure the quality of data mining results.  相似文献   

9.
Data mining extracts implicit, previously unknown, and potentially useful information from databases. Many approaches have been proposed to extract information, and one of the most important ones is finding association rules. Although a large amount of research has been devoted to this subject, none of it finds association rules from directed acyclic graph (DAG) data. Without such a mining method, the hidden knowledge, if any, cannot be discovered from the databases storing DAG data such as family genealogy profiles, product structures, XML documents, task precedence relations, and course structures. In this article, we define a new kind of association rule in DAG databases called the predecessor–successor rule, where a node x is a predecessor of another node y if we can find a path in DAG where x appears before y. The predecessor–successor rules enable us to observe how the characteristics of the predecessors influence the successors. An approach containing four stages is proposed to discover the predecessor–successor rules. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 621–637, 2006.  相似文献   

10.
The paper focuses on the adaptive relational association rule mining problem. Relational association rules represent a particular type of association rules which describe frequent relations that occur between the features characterizing the instances within a data set. We aim at re-mining an object set, previously mined, when the feature set characterizing the objects increases. An adaptive relational association rule method, based on the discovery of interesting relational association rules, is proposed. This method, called ARARM (Adaptive Relational Association Rule Mining) adapts the set of rules that was established by mining the data before the feature set changed, preserving the completeness. We aim to reach the result more efficiently than running the mining algorithm again from scratch on the feature-extended object set. Experiments testing the method's performance on several case studies are also reported. The obtained results highlight the efficiency of the ARARM method and confirm the potential of our proposal.  相似文献   

11.
Mining association rules plays an important role in data mining and knowledge discovery since it can reveal strong associations between items in databases. Nevertheless, an important problem with traditional association rule mining methods is that they can generate a huge amount of association rules depending on how parameters are set. However, users are often only interested in finding the strongest rules, and do not want to go through a large amount of rules or wait for these rules to be generated. To address those needs, algorithms have been proposed to mine the top-k association rules in databases, where users can directly set a parameter k to obtain the k most frequent rules. However, a major issue with these techniques is that they remain very costly in terms of execution time and memory. To address this issue, this paper presents a novel algorithm named ETARM (Efficient Top-k Association Rule Miner) to efficiently find the complete set of top-k association rules. The proposed algorithm integrates two novel candidate pruning properties to more effectively reduce the search space. These properties are applied during the candidate selection process to identify items that should not be used to expand a rule based on its confidence, to reduce the number of candidates. An extensive experimental evaluation on six standard benchmark datasets show that the proposed approach outperforms the state-of-the-art TopKRules algorithm both in terms of runtime and memory usage.  相似文献   

12.
In this paper we describe the final version of a knowledge discovery system, Telecommunication Network Alarm Sequence Analyzer (TASA), for telecommunication networks alarm data analysis. The system is based on the discovery of recurrent, temporal patterns of alarms in databases; these patterns, episode rules, can be used in the construction of real-time alarm correlation systems. Also association rules are used for identifying relationships between alarm properties. TASA uses a methodology for knowledge discovery in databases (KDD) where one first discovers large collections of patterns at once, and then performs interactive retrievals from the collection of patterns. The proposed methodology suits very well such KDD formalisms as association and episode rules, where large collections of potentially interesting rules can be found efficiently. When searching for the most interesting rules, simple threshold-like restrictions, such as rule frequency and confidence may satisfy a large number of rules. In TASA, this problem can be alleviated by templates and pattern expressions that describe the form of rules that are to be selected or rejected. Using templates the user can flexibly specify the focus of interest, and also iteratively refine it. Different versions of TASA have been in prototype use in four telecommunication companies since the beginning of 1995. TASA has been found useful in, e.g. finding long-term, rather frequently occurring dependencies, creating an overview of a short-term alarm sequence, and evaluating the alarm data base consistency and correctness.  相似文献   

13.
The association rules, discovered by traditional support–confidence based algorithms, provide us with concise statements of potentially useful information hidden in databases. However, only considering the constraints of minimum support and minimum confidence is far from satisfying in many cases. In this paper, we propose a fuzzy method to formulate how interesting an association rule may be. It is indicated by the membership values belonging to two fuzzy sets (i.e., the stronger rule set and the weaker rule set), and thus provides much more flexibility than traditional methods to discover some potentially more interesting association rules. Furthermore, revised algorithms based on Apriori algorithm and matrix structure are designed under this framework.  相似文献   

14.
Abstract

A novel approach to interactively acquire knowledge about new objects in a logic environment is presented. When the user supplies an unknown fact containing unknown objects (constants), the system will ask interesting membership and existential queries about the objects. The answers to these questions allow the system to update its knowledge base. Two basic strategies are implemented: one that examines existing Horn clauses for the predicate and another one that uses types. Furthermore, a powerful heuristic based on analogy, to pose the most interesting questions first, is presented.  相似文献   

15.
Katsuno and Mendelzon have distinguished two abstract frameworks for reasoning about change: theory revision and theory update. Theory revision involves a change in knowledge or belief with respect to a static world. By contrast, theory update involves a change of knowledge or belief in a changing world. In this paper, we are concerned with theory update. Winslett has shown that theory update should be computed “one model at a time.” Accordingly, we focus exclusively on the update of interpretations. We begin with a study of revision programming, introduced by Marek and Truszcyński to formulize interpretation update in a language similar to logic programming. While revision programs provide a useful and natural definition of interpretation update, they are limited to a fairly restricted set of update rules. Accordingly, we introduce the more general notion of rule update—interpretation update by arbitrary sets of inference rules. We show that Winslett's approach to update by means of arbitrary sets of formulas corresponds to a simple subclass of rule update. We also specify a simple embedding of rule update in Reiter’s default logic, obtained by augmenting the original update rules with default rules encoding the commonsense law of inertia—the principle that things change only when they are made to.  相似文献   

16.
We are considering knowledge discovery from data describing a piece of real or abstract world. The patterns being induced put in evidence some laws hidden in the data. The most natural representation of patterns-laws is by “if..., then...” decision rules relating some conditions with some decisions. The same representation of patterns is used in multi-attribute classification, thus the data searched for discovery of these patterns can be seen as classification data. We adopt the classification perspective to present an original methodology of inducing general laws from data and representing them by so-called monotonic decision rules. Monotonicity concerns relationships between values of condition and decision attributes, e.g. the greater the mass (condition attribute), the greater the gravity (decision attribute), which is a specific feature of decision rules discovered from data using the Dominance-based Rough Set Approach (DRSA). While in DRSA one has to suppose a priori the presence or absence of positive or negative monotonicity relationships which hold in the whole evaluation space, in this paper, we show that DRSA can be adapted to discover rules from any kind of input classification data, exhibiting monotonicity relationships which are unknown a priori and hold in some parts of the evaluation space only. This requires a proper non-invasive transformation of the classification data, permitting representation of both positive and negative monotonicity relationships that are to be discovered by the proposed methodology. Reported results of a computational experiment confirm that the proposed methodology leads to decision rules whose predictive ability is similar to the best classification predictors. It has, however, a unique advantage over all competitors because the monotonic decision rules can be read as laws characterizing the analyzed phenomena in terms of easily understandable “if..., then...” decision rules, while other predictor models have no such straightforward interpretation.  相似文献   

17.
In a recent paper by Toloo et al. [Toloo, M., Sohrabi, B., & Nalchigar, S. (2009). A new method for ranking discovered rules from data mining by DEA. Expert Systems with Applications, 36, 8503–8508], they proposed a new integrated data envelopment analysis model to find most efficient association rule in data mining. Then, utilizing this model, an algorithm is developed for ranking association rules by considering multiple criteria. In this paper, we show that their model only selects one efficient association rule by chance and is totally depended on the solution method or software is used for solving the problem. In addition, it is shown that their proposed algorithm can only rank efficient rules randomly and will fail to rank inefficient DMUs. We also refer to some other drawbacks in that paper and propose another approach to set up a full ranking of the association rules. A numerical example illustrates some contents of the paper.  相似文献   

18.
While recent research on rule learning has focused largely on finding highly accurate hypotheses, we evaluate the degree to which these hypotheses are also simple, that is small. To realize this, we compare well-known rule learners, such as CN2, RIPPER, PART, FOIL and C5.0 rules, with the benchmark system SL2 that explicitly aims at computing small rule sets with few literals. The results show that it is possible to obtain a similar level of accuracy as state-of-the-art rule learners using much smaller rule sets.  相似文献   

19.
戴敏  黄亚楼 《计算机应用》2006,26(1):207-0209
关联规则通常以规则列表形式表达,而许多关联规则挖掘算法往往产生大量规则,这给用户理解规则和从中找出感兴趣的规则带来了极大困难。为了标识重要的规则,而又保持挖掘结果的完整性,提出了根据规则的通用性,按照由概括—具体的方式分层表达关联规则。先用挖掘结果的最概括规则集表达出最通用、最基本的领域知识,再根据用户要求分层查看概括规则下面更具体的规则。这种表达方式可以在不同层次上查看关联规则,使挖掘结果更容易管理和被人理解。  相似文献   

20.
During electronic commerce (EC) environment, how to effectively mine the useful transaction information will be an important issue to be addressed in designing the marketing strategy for most enterprises. Especially, the relationships between different databases (e.g., the transaction and online browsing database) may have the unknown and potential knowledge of business intelligence. Two important issues of mining association rules were mentioned to address EC application in this study. The first issue is the discovery of generalized fuzzy association rules in the transaction database. The second issue is to discover association rules from the web usage data and the large itemsets identified in the transaction database. A cluster-based fuzzy association rules (CBFAR) mining architecture is then proposed to simultaneously address such two issues in this study. Three contributions were achieved as: (a) an efficient fuzzy association rule miner based on cluster-based fuzzy-sets tables is presented to identify all the large fuzzy itemsets; (b) this approach requires less contrast to generate large itemsets; (3) a fuzzy rule mining approach is used to compute the confidence values for discovering the relationships between transaction database and browsing information database. Finally, a simulated example during EC environment is provided to demonstrate the rationality and feasibility of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号