首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Induction by attribute elimination   总被引:6,自引:0,他引:6  
  相似文献   

2.
加权布尔型关联规则的研究   总被引:11,自引:2,他引:11  
为了解决加权支持率可能大于1的不足,对属性的权重集归一化,并提出第1类加权关联规则挖掘算法。此算法能有效地考虑属性的权重,并且认为规则的重要性随着规则中所含属性数量的增加而增加;但在有些数据库中,挖掘关联规则只需考虑属性的权重,也就是说规则的重要性不随着规则中所含属性数量的增加而增加。该文通过对各个元素所作贡献的加权构造了一个加权数据库,提出了解决此类问题的第2类加权关联规则挖掘算法。  相似文献   

3.
加权模糊关联规则的研究   总被引:1,自引:0,他引:1  
1 引言关联规则是展示属性-值频繁地在给定的数据集中一起出现的条件,最常见的是对大型超市的事务数据库进行货篮分析,文[1]提出了解决此类问题的布尔型属性关联规则的Apriori算法。数量关联在股市分析、银行存款分析和医疗诊断等众多方面都有重要应用价值。数量关联用来描述数量型属性特征之间的相互关系,用数量型关联规则来表示,如“10%年龄在50-70之间的已婚人员至少拥有两辆汽车”。文[2]首先讨论数量型关联规则,文中的挖掘算法将数量型属性划分成多个区间,但这样的方法会引起划分边界过硬的缺点。  相似文献   

4.
Validity interval analysis (VIA) is a generic tool for analyzing the input-output behavior of feedforward neural networks. VIA is a rule extraction technique that relies on a rule refinement algorithm. The rules are of the form R(i)-->R(0) i.e. "if the input of the neural network is in the region R(i), then its output is in the region R(0)," where regions are axis parallel hypercubes. VIA conjectures, then refines and checks rules for inconsistency. This process can be computationally expensive, and the rule refinement phase becomes critical. Hence, the importance of knowing the complexity of these rule refinement algorithms. In this paper, we show that the rule refinement part of VIA always converges in one run for single-weight-layer networks, and has an exponential average rate of convergence for multilayer networks. We also discuss some variations of the standard VIA formulae.  相似文献   

5.
加权关联规则挖掘算法的研究   总被引:20,自引:0,他引:20  
讨论了加权关联规则的挖掘算法,对布尔型属性,在挖掘算法MINWAL(O)和MINWAL(W)的基础上给出一种改进的加权关联规则挖掘算法,此算法能有效地考虑布尔型属必的重要性和规则中所含属性的个数,对数量型属性,应用竞争聚集算法将数量型属性划分成若干个模糊集,产系统地提出加权模糊关联规则的挖掘算法,此算法能有效地考虑数量型属性的重要性和规则中所含属性的个数,并适用于大型数据库。  相似文献   

6.
In this paper, we address the problem of managing inconsistent databases, i.e., databases violating integrity constraints. We propose a general logic framework for computing repairs and consistent answers over inconsistent databases. A repair for a possibly inconsistent database is a minimal set of insert and delete operations which makes the database consistent, whereas a consistent answer is a set of tuples derived from the database, satisfying all integrity constraints. In our framework, different types of rules defining general integrity constraints, repair constraints (i.e., rules defining conditions on the insertion or deletion of atoms), and prioritized constraints (i.e., rules defining priorities among updates and repairs) are considered. We propose a technique based on the rewriting of constraints into (prioritized) extended disjunctive rules with two different forms of negation (negation as failure and classical negation). The disjunctive program can be used for two different purposes: to compute "repairs" for the database and produce consistent answers, i.e., a maximal set of atoms which do not violate the constraints. We show that our technique is sound, complete (each preferred stable model defines a repair and each repair is derived from a preferred stable model), and more general than techniques previously proposed.  相似文献   

7.
基于属性互信息熵的量化关联规则挖掘   总被引:2,自引:1,他引:1       下载免费PDF全文
在量化关联规则挖掘中存在量化属性及其取值区间的组合爆炸问题,影响算法效率。提出算法BMIQAR,通过考察量化属性间互信息熵,找到具有强信息关系的属性集,从中得到频繁项集以产生规则。实验表明,由于在属性层进行了剪枝,因此缩减了搜索空间,提高了算法的性能,且能得到绝大多数置信度较高的规则。  相似文献   

8.
The implication of multivalued dependencies in relational databases has originally been defined in the context of some fixed finite universe. While axiomatisability and implication problems have been intensely studied with respect to this notion almost no research has been devoted towards the alternative notion of implication in which the underlying universe of attributes is left undetermined. Based on a set of common inference rules we establish all axiomatisations in undetermined universes, and all axiomatisations in fixed universes that indicate the role of the complementation rule as a means of database normalisation. This characterises the expressiveness of several incomplete sets of inference rules. We also establish relationships between axiomatisations in fixed and undetermined universes, and study the time complexity of the implication problem in undetermined universes. The results of this paper establish a foundation for reasoning about multivalued dependencies without the assumption of a fixed underlying universe.  相似文献   

9.
Starting from fuzzy binary data represented as tables in the fuzzy relational database, in this paper, we use fuzzy formal concept analysis to reduce the tables size to only keep the minimal rows in each table, without losing knowledge (i.e., association rules extracted from reduced databases are identical at given precision level). More specifically, we develop a fuzzy extension of a previously proposed algorithm for crisp data reduction without loss of knowledge. The fuzzy Galois connection based on the Lukasiewicz implication is mainly used in the definition of the closure operator according to a precision level, which makes data reduction sensitive to the variation of this precision level.  相似文献   

10.
Multiobjective genetic fuzzy rule selection is based on the generation of a set of candidate fuzzy classification rules using a preestablished granularity or multiple fuzzy partitions with different granularities for each attribute. Then, a multiobjective evolutionary algorithm is applied to perform fuzzy rule selection. Since using multiple granularities for the same attribute has been sometimes pointed out as to involve a potential interpretability loss, a mechanism to specify appropriate single granularities at the rule extraction stage has been proposed to avoid it but maintaining or even improving the classification performance. In this work, we perform a statistical study on this proposal and we extend it by combining the single granularity-based approach with a lateral tuning of the membership functions, i.e., complete contexts learning. In this way, we analyze in depth the importance of determining the appropriate contexts for learning fuzzy classifiers. To this end, we will compare the single granularity-based approach with the use of multiple granularities with and without tuning. The results show that the performance of the obtained classifiers can be even improved by obtaining the appropriate variable contexts, i.e., appropriate granularities and membership function parameters.  相似文献   

11.
Recursive neural network rule extraction for data with mixed attributes   总被引:1,自引:0,他引:1  
In this paper, we present a recursive algorithm for extracting classification rules from feedforward neural networks (NNs) that have been trained on data sets having both discrete and continuous attributes. The novelty of this algorithm lies in the conditions of the extracted rules: the rule conditions involving discrete attributes are disjoint from those involving continuous attributes. The algorithm starts by first generating rules with discrete attributes only to explain the classification process of the NN. If the accuracy of a rule with only discrete attributes is not satisfactory, the algorithm refines this rule by recursively generating more rules with discrete attributes not already present in the rule condition, or by generating a hyperplane involving only the continuous attributes. We show that for three real-life credit scoring data sets, the algorithm generates rules that are not only more accurate but also more comprehensible than those generated by other NN rule extraction methods.  相似文献   

12.
应用于入侵检测系统的报警关联的改进Apriori算法   总被引:2,自引:1,他引:1  
王台华  万宇文  郭帆  余敏 《计算机应用》2010,30(7):1785-1788
在众多的关联规则挖掘算法中,Apriori算法是最为经典的一个,但Apriori算法有以下缺陷:需要扫描多次数据库、生成大量候选集以及迭代求解频繁项集。提出了一种一步交集操作得到最大频繁项目集的方法。支持度由交集的次数得到而无需再去扫描事务数据库,将其中一些属性进行编号能减少存储空间且方便搜索候选集列表,从而提高算法的效率。最后针对入侵检测系统形成关联规则。实验结果表明,优化后的算法能有效地提高关联规则挖掘的效率。  相似文献   

13.
粗集理论中基于规则及其参数的分类识别   总被引:1,自引:0,他引:1  
粗集理论的提出为获取if...then...形式的规则提供了重要方法,但是现有的方法一般没有考虑决策表中属性的权重信息。此外,在利用规则进行分类识别时,还可能用到其他方面的信息,如导出规则的等价类、规则前件的长度等。因此,笔者从规则的支持量、精度、适用度、权重、导出规则的等价类和前件长度几个参数对一条规则进行全面描述。利用规则及其参数,提出了几种分类识别的策略。  相似文献   

14.
Rough-fuzzy MLP: modular evolution, rule generation, and evaluation   总被引:8,自引:0,他引:8  
A methodology is described for evolving a Rough-fuzzy multi layer perceptron with modular concept using a genetic algorithm to obtain a structured network suitable for both classification and rule extraction. The modular concept, based on "divide and conquer" strategy, provides accelerated training and a compact network suitable for generating a minimum number of rules with high certainty values. The concept of variable mutation operator is introduced for preserving the localized structure of the constituting knowledge-based subnetworks, while they are integrated and evolved. Rough set dependency rules are generated directly from the real valued attribute table containing fuzzy membership values. Two new indices viz., "certainty" and "confusion" in a decision are defined for evaluating quantitatively the quality of rules. The effectiveness of the model and the rule extraction algorithm is extensively demonstrated through experiments alongwith comparisons.  相似文献   

15.
关联规则的冗余删除与聚类   总被引:9,自引:0,他引:9  
关联规则挖掘常常会产生大量的规则,这使得用户分析和利用这些规则变得十分困难,尤其是数据库中属性高度相关时,问题更为突出.为了帮助用户做探索式分析,可以采用各种技术来有效地减少规则数量,如约束性关联规则挖掘、对规则进行聚类或泛化等技术.本文提出一种关联规则冗余删除算法ADRR和一种关联规则聚类算法ACAR.根据集合具有的性质,证明在挖掘到的关联规则中存在大量可以删除的冗余规则,从而提出了算法ADRR;算法ACAR采用一种新的用项目间的相关性来定义规则间距离的方法,结合DBSCAN算法的思想对关联规则进行聚类.最后将本文提出的算法加以实现,实验结果表明该算法暑有数可行的.且具较高的效率。  相似文献   

16.
Given a transaction database as a global set of transactions and its local database obtained by some conditioning of the global database, we consider pairs of itemsets whose degrees of correlation are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is already known as discovery of correlation, and has been studied as the highly correlated itemsets are characteristic in the database. However, even noncharacteristic paired itemsets are also meaningful provided the degree of correlation increases significantly in the local database compared with the global one. They can be implicit and hidden evidences showing that something particular to the local database occurs, even though they were not previously realized to be characteristic. From this viewpoint, we have proposed measurement of the significance of paired itemsets by the difference of two correlations before and after the conditioning of the global database, and have defined a notion of DC pairs, whose degrees of difference of correlation are high. In this paper, we develop an algorithm for mining DC pairs and apply it to a transaction database with time stamp data. The problem of finding DC pairs for large databases is computationally hard in general, as the algorithm has to check even noncharacteristic paired itemsets. However, we show that our algorithm equipped with some pruning rules works successfully to find DC pairs that may be significant.  相似文献   

17.
Full hierarchical dependencies (FHDs) constitute a large class of relational dependencies. A relation exhibits an FHD precisely when it is the natural join over at least two of its projections that all share the same join attributes. Therefore, FHDs generalise multivalued dependencies (MVDs) in which case the number of these projections is precisely two. The implication of FHDs has originally been defined in the context of some fixed finite universe. This paper identifies a sound and complete set of inference rules for the implication of FHDs. This axiomatisation is very reminiscent of that for MVDs. Then, an alternative notion of FHD implication is introduced in which the underlying set of attributes is left undetermined. The first main result establishes a finite axiomatisation for FHD implication in undetermined universes. It is then formally clarified that the complementation rule is only a mere means for database normalisation. In fact, the second main result establishes a finite axiomatisation for FHD implication in fixed universes which allows to infer FHDs either without using the complementation rule at all or only in the very last step of the inference. This also characterises the expressiveness of an incomplete set of inference rules in fixed universes. The results extend previous work on MVDs by Biskup.  相似文献   

18.
《Applied Soft Computing》2008,8(1):646-656
In this paper, a Pareto-based multi-objective differential evolution (DE) algorithm is proposed as a search strategy for mining accurate and comprehensible numeric association rules (ARs) which are optimal in the wider sense that no other rules are superior to them when all objectives are simultaneously considered. The proposed DE guided the search of ARs toward the global Pareto-optimal set while maintaining adequate population diversity to capture as many high-quality ARs as possible. ARs mining problem is formulated as a four-objective optimization problem. Support, confidence value and the comprehensibility of the rule are maximization objectives while the amplitude of the intervals which conforms the itemset and rule is minimization objective. It has been designed to simultaneously search for intervals of numeric attributes and the discovery of ARs which these intervals conform in only single run of DE. Contrary to the methods used as usual, ARs are directly mined without generating frequent itemsets. The proposed DE performs a database-independent approach which does not rely upon the minimum support and the minimum confidence thresholds which are hard to determine for each database. The efficiency of the proposed DE is validated upon synthetic and real databases.  相似文献   

19.
The association rules, discovered by traditional support–confidence based algorithms, provide us with concise statements of potentially useful information hidden in databases. However, only considering the constraints of minimum support and minimum confidence is far from satisfying in many cases. In this paper, we propose a fuzzy method to formulate how interesting an association rule may be. It is indicated by the membership values belonging to two fuzzy sets (i.e., the stronger rule set and the weaker rule set), and thus provides much more flexibility than traditional methods to discover some potentially more interesting association rules. Furthermore, revised algorithms based on Apriori algorithm and matrix structure are designed under this framework.  相似文献   

20.
Much of the research on extracting rules from a large amount of data has focused on the extraction of a general rule that covers as many data as possible. In the field of health care, where people’s lives are at stake, it is necessary to diagnose appropriately without overlooking the small number of patients who show different symptoms. Thus, the exceptional rules for rare cases are also important. From such a viewpoint, multiple rules, each of which covers a part of the data, are needed for covering all data. In this paper, we describe the extraction of such multiple rules, each of which is expressed by a tree structural program. We consider a multi-agent approach to be effective for this purpose. Each agent has a rule that covers a part of the data set, and multiple rules which cover all data are extracted by multi-agent cooperation. In order to realize this approach, we propose a new method for rule extraction using Automatically Defined Groups (ADG). The ADG, which is based on Genetic Programming, is an evolutionary optimization method of multi-agent systems. By using this method, we can acquire both the number of necessary rules and the tree structural programs which represent these respective rules. We applied this method to a database used in the machine learning field and showed its effectiveness. Moreover, we applied this method to medical data and developed a diagnostic system for coronary heart diseases  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号