首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
基于贝叶斯网络的频繁模式兴趣度计算及剪枝   总被引:2,自引:0,他引:2  
胡春玲  吴信东  胡学钢  姚宏亮 《软件学报》2011,22(12):2934-2950
采用贝叶斯网络表示领域知识,提出一种基于领域知识的频繁项集和频繁属性集的兴趣度计算和剪枝方法 BN-EJTR,其目的在于发现与当前领域知识不一致的知识,以解决频繁模式挖掘所面临的有趣性和冗余问题.针对兴趣度计算过程中批量推理的需求,BN-EJTR提供了一种基于扩展邻接树消元的贝叶斯网络推理算法,用于计算大量项集在贝叶斯网络中的支持度;同时,BN-EJTR提供了一种基于兴趣度阈值和拓扑有趣性的剪枝算法.实验结果表明,与同类方法相比,方法 BN-EJTR具有良好的时间性能,而且剪枝效果明显;分析发现,经过剪枝后的频繁属性集和频繁项集相对于领域知识符合有趣性要求.  相似文献   

2.
本文根据关联规则和分类规则的概念与表示形式,指出在关联规则挖掘过程中如果指定挖掘与一个确定的项相关联,那么就是分类规则挖掘了,论述了分类规则是特殊情况下的关联规则,并指出在这种特殊情况下,关联规则所具有的特征;然后根据这一论述,提出了一种在关联规则挖掘算法中利用限制条件概率分布来发现分类规则的算法。  相似文献   

3.

给出项权值变化的数据模型形式化表示, 构建新的加权项集剪枝策略及其模式评价框架SCCI (supportconfidence-correlation-interest), 提出基于项权值变化和SCCI 评价框架的加权正负关联规则挖掘算法. 该算法考虑了项权值变化的数据特点, 采用新的剪枝方法和评价框架, 通过项集权值简单计算和比较, 挖掘有效的加权正负关联规则. 实验结果表明, 该算法能够有效地减少候选项集数量和挖掘时间, 挖掘出有趣的关联模式, 避免无效模式出现, 挖掘效率高于相比较的现有算法, 解决了项权值变化的加权负模式挖掘问题.

  相似文献   

4.
基于聚类的模糊遗传挖掘算法的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
通过分析连续型属性数据的特点和已有的关联规则挖掘算法,在定量描述的准确性和算法的高效性方面作了进一步研究,针对已有的通过结合最大一项集和隶属函数值去计算染色体的适应值的模糊遗传挖掘算法速度慢的问题,提出一种基于聚类的模糊遗传关联规则挖掘算法。该算法采用模糊遗传原理在交易数据中同时提取关联规则和隶属函数。同时,采用k-means聚类算法对种群中的染色体进行分类并且依据分类得到的信息和自身的信息评估每个染色体的适应性,从而降低了扫描数据库的次数,测试结果表明该算法速度快,准确度高。  相似文献   

5.
黑洞模式是人类移动模式研究中的标志性成果,但在移动模式的演化建模方面存在局限性,因此研究具有时间演化特性的黑洞模式。新模式定义需要满足群体规模性、空间区域性和时间持续性3方面要求。提出具有时间演化特性的动态空间网络模型,基于此模型定义新的黑洞模式,并提出相应的挖掘算法。为了提升模式挖掘算法的效率,设计了基于时空划分的候选模式剪枝算法,有效降低了挖掘算法在时空维中的搜索代价。最后,基于真实数据的实验结果表明了该黑洞模式及其挖掘算法的有效性和可行性。  相似文献   

6.
演化算法中,预选择算子用于为后续的环境选择过程筛选出好的潜在候选后代解.现有预选择算子大多基于适应值评估、代理模型或分类模型.由于预选择过程本质上是一个分类过程,因此基于分类的预选择过程天然适用于演化算法.先前研究工作采用二分类或多分类模型进行预选择,需预先准备“好”和“差”两组或具有区分性的多组训练样本来构建分类模型,而随着演化算法的执行,“好”解和“差”解之间的界限将愈加模糊,因此准备具有区分性的两组或多组训练样本将变得具有挑战性.为解决该问题,本文提出了一种基于单分类的预选择策略(One-class Classification based PreSelection,OCPS),首先将当前种群中的解均视为“好”类样本,之后只利用该类“好”样本构建单分类模型,然后利用构建的模型对产生的多个候选解进行标记与选择.提出的策略应用在三个代表性演化算法中,数值实验结果表明,提出的策略能够提升现有演化算法的收敛速度.  相似文献   

7.
在数据库挖掘中,要充分地快速地挖掘出数据库中的任意有趣模式,而现实数据挖掘查询等这种任意合成模式特别复杂,如果只利用现有的基于频繁项集算法直接进行复杂模式挖掘是困难的.为解决该问题,一种基于频繁项集的条件模式挖掘被提出.从条件模式定义,性质,条件模式挖掘算法等方面来阐述解决此类任意条件下模式挖掘的问题.该条件模式的挖掘,使得数据库进行任意模式的新知识新规律发现变得更快捷有效.在现实世界的知识挖掘中,条件模式挖掘更能贴近现实知识的发现.  相似文献   

8.
元规则指导的知识发现方法研究   总被引:4,自引:1,他引:3  
传统的知识发现方法缺乏挖掘的针对性,效率较低,挖掘出的规则数量巨大,需要进行复杂的知识筛选工作;挖掘出的规则用低层次的原始数据表示,难以理解。无规则是对挖掘结果模式的一种表示方法,是将背景知识融入知识发现过程、提高挖掘结果的有趣性和挖掘速度的重要方法。该文研究利用概念表示数据之间的关系,提高规则的可理解性;将概念和无规则相集合,提出了基于概念的无规则指导的知识发现方法,并给出了概念的生成方法和无规则的构造方法。  相似文献   

9.
基于粒计算的决策树并行算法的应用   总被引:1,自引:0,他引:1  
针对传统的决策树分类算法不能有效解决海量数据挖掘的问题,结合并行处理模型M apReduce ,研究基于粒计算的ID3决策树分类的并行化处理方法。基于信息粒的二进制表示来构建属性的二进制信息粒向量,给出数据集的二进制信息粒关联矩阵表示;基于二进制信息粒关联矩阵,提出属性的信息增益的计算方法,设计基于M apReduce的粒计算决策树并行分类算法。通过使用标准数据集和实际气象领域的雷电真实数据集进行测试,验证了该算法的有效性。  相似文献   

10.
Pareto强度值演化算法求解多目标优化问题   总被引:2,自引:0,他引:2  
近年来,多目标优化问题求解已成为演化计算的一个重要研究方向,而基于Pareto最优概念的多目标演化算法则是当前演化计算的研究热点.多目标演化算法的研究目标是使算法种群快速收敛并均匀分布于问题的非劣最优域.本文定义和使用稀松密度来保持群体中个体的均匀分布,并将个体的Pareto强度值和稀松密度合并到个体的适应值定义中.通过对测试函数的实验,验证了算法的可行性和有效性.  相似文献   

11.
In this article, we propose a new approach to the virus DNA–based evolutionary algorithm (VDNA‐EA) to implement self‐learning of a class of Takagi‐Sugeno (T‐S) fuzzy controllers. The fuzzy controllers use T‐S fuzzy rules with linear consequent, the generalized input fuzzy sets, Zadeh fuzzy logic and operators, and the generalized defuzzifier. The fuzzy controllers are proved to be nonlinear proportional‐integral (PI) controllers with variable gains. The fuzzy rules are discovered automatically and the design parameters in the input fuzzy sets and the linear rule consequent are optimized simultaneously by the VDNA‐EA. The VDNA‐EA uses the VDNA encoding method that stemmed from the structure of the VDNA to encode the design parameters of the fuzzy controllers. We use the frameshift decoding method of the VDNA to decode the DNA chromosome into the design parameters of the fuzzy controllers. In addition, the gene transfer operation and bacterial mutation operation inspired by a microbial evolution phenomenon are introduced into the VDNA‐EA. Moreover, frameshift mutation operations based on the DNA genetic operations are used in the VDNA‐EA to add and delete adaptively fuzzy rules. Our encoding method can significantly shorten the code length of the DNA chromosomes and improve the encoding efficiency. The length of the chromosome is variable and it is easy to insert and delete parts of the chromosome. It is suitable for complex knowledge representation and is easy for the genetic operations at gene level to be introduced into the VDNA‐EA. We show how to implement the new method to self‐learn a T‐S fuzzy controller in the control of a nonlinear system. The fuzzy controller can be constructed automatically by the VDNA‐EA. Computer simulation results indicate that the new method is effective and the designed fuzzy controller is satisfactory. © 2003 Wiley Periodicals, Inc.  相似文献   

12.
基于信息增益的中文文本关联分类   总被引:1,自引:0,他引:1  
关联分类是一种通过挖掘训练集中的关联规则,并利用这些规则预测新数据类属性的分类技术。最近的研究表明,关联分类取得了比传统的分类方法如C4.5更高的准确率。现有的基于支持度-置信度架构的关联分类方法仅仅是选择频繁文字构建分类规则,忽略了文字的分类有效性。本文提出一种新的ACIG算法,结合信息增益与FoilGain在中文文本中选择规则的文字,以提高文字的分类有效性。实验结果表明,ACIG算法比其他关联分类算法(CPAR)有更高的准确率。  相似文献   

13.
The knowledge-based artificial neural network (KBANN) is composed of phases involving the expression of domain knowledge, the abstraction of domain knowledge at neural networks, the training of neural networks, and finally, the extraction of rules from trained neural networks. The KBANN attempts to open up the neural network black box and generates symbolic rules with (approximately) the same predictive power as the neural network itself. An advantage of using KBANN is that the neural network considers the contribution of the inputs towards classification as a group, while rule-based algorithms like C5.0 measure the individual contribution of the inputs one at a time as the tree is grown. The knowledge consolidation model (KCM) combines the rules extracted using KBANN (NeuroRule), frequency matrix (which is similar to the Naïve Bayesian technique), and C5.0 algorithm. The KCM can effectively integrate multiple rule sets into one centralized knowledge base. The cumulative rules from other single models can improve overall performance as it can reduce error-term and increase R-square. The key idea in the KCM is to combine a number of classifiers such that the resulting combined system achieves higher classification accuracy and efficiency than the original single classifiers. The aim of KCM is to design a composite system that outperforms any individual classifier by pooling together the decisions of all classifiers. Another advantage of KCM is that it does not need the memory space to store the dataset as only extracted knowledge is necessary in build this integrated model. It can also reduce the costs from storage allocation, memory, and time schedule. In order to verify the feasibility and effectiveness of KCM, personal credit rating dataset provided by a local bank in Seoul, Republic of Korea is used in this study. The results from the tests show that the performance of KCM is superior to that of the other single models such as multiple discriminant analysis, logistic regression, frequency matrix, neural networks, decision trees, and NeuroRule. Moreover, our model is superior to a previous algorithm for the extraction of rules from general neural networks.  相似文献   

14.
进化算法是模拟自然界生物进化的启发式算法,具有良好的搜索能力和灵活性且广泛用于复杂优化问题的求解,但在求解过程中默认问题先验知识为零,然而由于问题很少孤立存在,解决单一任务积累的经验可迁移至其他相关任务。进化迁移优化算法利用相关领域的知识学习和迁移,实现了更好的优化效率和性能。介绍进化迁移优化算法的基本分类,从源任务选择、知识迁移、缩小搜索空间差异、进化算法搜索、进化资源分配等5个角度出发对主流进化迁移优化算法的核心策略和优劣势进行梳理和分析。通过中国知网和WOS平台对2014年至2021年的进化迁移优化相关文献进行检索,运用知识图谱进行数据挖掘、信息处理、知识计量和图形绘制,根据进化迁移优化的发展趋势和经验分析总结了其面临的主要挑战和未来研究方向。  相似文献   

15.
16.
This article deals with two key problems of data mining, the automation of the data mining process and the integration of human domain experts. We show how an evolutionary algorithm (EA) can be used to optimize radial basis function (RBF) neural networks used for classification tasks. First, input features will be chosen from a set of possible input features (feature selection). Second, the number of hidden neurons is adapted (model selection). It is known that interpretable (fuzzy-type) rule sets may be extracted from RBF networks. We show how appropriate training algorithms for RBF networks and penalty terms in the fitness function of the EA may improve the understandability of the extracted rules. The properties of our approach are set out by means of two industrial application examples (process identification and quality control).  相似文献   

17.
Artificial neural networks (ANNs) are mathematical models inspired from the biological nervous system. They have the ability of predicting, learning from experiences and generalizing from previous examples. An important drawback of ANNs is their very limited explanation capability, mainly due to the fact that knowledge embedded within ANNs is distributed over the activations and the connection weights. Therefore, one of the main challenges in the recent decades is to extract classification rules from ANNs. This paper presents a novel approach to extract fuzzy classification rules (FCR) from ANNs because of the fact that fuzzy rules are more interpretable and cope better with pervasive uncertainty and vagueness with respect to crisp rules. A soft computing based algorithm is developed to generate fuzzy rules based on a data mining tool (DIFACONN-miner), which was recently developed by the authors. Fuzzy DIFACONN-miner algorithm can extract fuzzy classification rules from datasets containing both categorical and continuous attributes. Experimental research on the benchmark datasets and comparisons with other fuzzy rule based classification (FRBC) algorithms has shown that the proposed algorithm yields high classification accuracies and comprehensible rule sets.  相似文献   

18.
The comprehensibility aspect of rule discovery is of emerging interest in the realm of knowledge discovery in databases. Of the many cognitive and psychological factors relating the comprehensibility of knowledge, we focus on the use of human amenable concepts as a representation language in expressing classification rules. Existing work in neural logic networks (or neulonets) provides impetus for our research; its strength lies in its ability to learn and represent complex human logic in decision-making using symbolic-interpretable net rules. A novel technique is developed for neulonet learning by composing net rules using genetic programming. Coupled with a sequential covering approach for generating a list of neulonets, the straightforward extraction of human-like logic rules from each neulonet provides an alternate perspective to the greater extent of knowledge that can potentially be expressed and discovered, while the entire list of neulonets together constitute an effective classifier. We show how the sequential covering approach is analogous to association-based classification, leading to the development of an association-based neulonet classifier. Empirical study shows that associative classification integrated with the genetic construction of neulonets performs better than general association-based classifiers in terms of higher accuracies and smaller rule sets. This is due to the richness in logic expression inherent in the neulonet learning paradigm.  相似文献   

19.
Data mining usually means the methodologies and tools for the efficient new knowledge discovery from databases. In this paper, a genetic algorithms (GAs) based approach to assess breast cancer pattern is proposed for extracting the decision rules including the predictors, the corresponding inequality and threshold values simultaneously so as to building a decision-making model with maximum prediction accuracy. Early many studies of handling the breast cancer diagnostic problems used the statistical related techniques. As the diagnosis of breast cancer is highly nonlinear in nature, it is hard to develop a comprehensive model taking into account all the independent variables using conventional statistical approaches. Recently, numerous studies have demonstrated that neural networks (NNs) are more reliable than the traditional statistical approaches and the dynamic stress method. The usefulness of using NNs have been reported in literatures but the most obstacle is the in the building and using the model in which the classification rules are hard to be realized. We compared our results against a commercial data mining software, and we show experimentally that the proposed rule extraction approach is promising for improving prediction accuracy and enhancing the modeling simplicity. In particular, our approach is capable of extracting rules which can be developed as a computer model for prediction or classification of breast cancer potential like expert systems.  相似文献   

20.
Elicitation of classification rules by fuzzy data mining   总被引:1,自引:0,他引:1  
Data mining techniques can be used to find potentially useful patterns from data and to ease the knowledge acquisition bottleneck in building prototype rule-based systems. Based on the partition methods presented in simple-fuzzy-partition-based method (SFPBM) proposed by Hu et al. (Comput. Ind. Eng. 43(4) (2002) 735), the aim of this paper is to propose a new fuzzy data mining technique consisting of two phases to find fuzzy if–then rules for classification problems: one to find frequent fuzzy grids by using a pre-specified simple fuzzy partition method to divide each quantitative attribute, and the other to generate fuzzy classification rules from frequent fuzzy grids. To improve the classification performance of the proposed method, we specially incorporate adaptive rules proposed by Nozaki et al. (IEEE Trans. Fuzzy Syst. 4(3) (1996) 238) into our methods to adjust the confidence of each classification rule. For classification generalization ability, the simulation results from the iris data demonstrate that the proposed method may effectively derive fuzzy classification rules from training samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号