首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
A difference measure of a condition attribute value is defined and a new classification rule inducing algorithm based on it is presented. The first key feature of the new algorithm, in comparison with standard rough set method, is not the attributes reduction step in advance; the second feature is generating a rule for each example; and the third feature is that rules in its rule set have intersection cover or they may cover some same examples and vote for them. Experimental results on 70 datasets show that the new algorithm has a competitive classification performance.  相似文献   

2.
Reduct and attribute order   总被引:14,自引:2,他引:12       下载免费PDF全文
Based on the principle of discernibility matrix, a kind of reduction algorithm with attribute order has been developed and its solution has been proved to be complete for reduct and unique for a given attribute order. Being called the reduct problem, this algorithm can be regarded as a mapping R = Reduct(S) from the attribute order space O to the reduct space R for an information system (U, C ∪ D), where U is the universe and C and D are two sets of condition and decision attributes respectively. This paper focuses on the reverse problem of reduct problem S = Order(R), i.e., for a given reduct R of an information system, we determine the solution of S = Order(R) in the space θ. First, we need to prove that there is at least one attribute order S such that S = Order(R). Then, some decision rules are proposed, which can be used directly to decide whether the pair of attribute orders has the same reduct. The main method is based on the fact that an attribute order can be transformed into another one by moving the attribute for limited times. Thus, the decision of the pair of attribute orders can be altered to the decision of the sequence of neighboring pairs of attribute orders. Therefore,the basic theorem of neighboring pair of attribute orders is first proved, then, the decision theorem of attribute order is proved accordingly by the second attribute.  相似文献   

3.
A biclustering algorithm extends conventional clustering techniques to extract all of the meaningful subgroups of genes and conditions in the expression matrix of a microarray dataset. However, such algorithms are very sensitive to input parameters and show poor scalability. This paper proposes a scalable unsupervised biclustering framework, SUBic, to find high quality constant-row biclusters in an expression matrix effectively. A one-dimensional clustering algorithm is proposed to partition the attributes, that is, columns of an expression matrix into disjoint groups based on the similarity of expression values. These groups form a set of short transactions and are used to discover a set of frequent itemsets each of which corresponds to a bicluster. However, a bicluster may include any attribute whose expression value is not similar enough to others, so a bicluster refinement is used to enhance the quality of a bicluster by removing those attributes based on its distribution of expression values. The performance of the proposed method is comparatively analyzed through a series of experiments on synthetic and real datasets.  相似文献   

4.
Decision trees have three main disadvantages: reduced performance when the training set is small; rigid decision criteria; and the fact that a single "uncharacteristic" attribute might "derail" the classification process. In this paper we present ConfDTree (Confidence-Based Decision Tree) -- a post-processing method that enables decision trees to better classify outlier instances. This method, which can be applied to any decision tree algorithm, uses easy-to-implement statistical methods (confidence intervals and two-proportion tests) in order to identify hard-to-classify instances and to propose alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%-9% in the AUC performance is reported.  相似文献   

5.
In this paper,we investigate a new problem–misleading classification in which each test instance is associated with an original class and a misleading class.Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes.We discuss two cases of misleading classification.For the case where the classification algorithm is unknown to the data owner,a KNN based Ranking Algorithm(KRA)is proposed to rank all candidate instances based on the similarities between candidate instances and test instances.For the case where the classification algorithm is known,we propose a Greedy Ranking Algorithm(GRA)which evaluates each candidate instance by building up a classifier to predict the test set.In addition,we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm.Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates.When the classification algorithm is known,GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required.  相似文献   

6.
One view of finding a personalized solution of reduct in an information system is grounded on the viewpoint that attribute order can serve as a kind of semantic representation of user requirements. Thus the problem of finding personalized solutions can be transformed into computing the reduct on an attribute order. The second attribute theorem describes the relationship between the set of attribute orders and the set of reducts, and can be used to transform the problem of searching solutions to meet user requirements into the problem of modifying reduct based on a given attribute order. An algorithm is implied based on the second attribute theorem, with computation on the discernibility matrix. Its time complexity is O(n^2 × m) (n is the number of the objects and m the number of the attributes of an information system). This paper presents another effective second attribute algorithm for facilitating the use of the second attribute theorem, with computation on the tree expression of an information system. The time complexity of the new algorithm is linear in n. This algorithm is proved to be equivalent to the algorithm on the discernibility matrix.  相似文献   

7.
Fuzzy c-means (FCM) algorithm is an important clustering method in pattern recognition, while the fuzziness parameter, m, in FCM algorithm is a key parameter that can significantly affect the result of clustering. Cluster validity index (CVI) is a kind of criterion function to validate the clustering results, thereby determining the optimal cluster number of a data set. From the perspective of cluster validation, we propose a novel method to select the optimal value of m in FCM, and four well-known CVIs, namely XB, VK, VT, and SC, for fuzzy clustering are used. In this method, the optimal value of m is determined when CVIs reach their minimum values. Experimental results on four synthetic data sets and four real data sets have demonstrated that the range of m is [2, 3.5] and the optimal interval is [2.5, 3].  相似文献   

8.
The intersection of N halfplanes is a basic problem in computational geometry and computer graphics,The optimal offline algorithm for this problem runs in time O(N log N).In this paper,an optimal online algorithm which runs also in time O(N log N) for this problem is presented.The main idea of the algorithm is to give a new definition for the left side of a given line,to assign the order for the opoints of a convex polygon.and then to use binary search method in an ordered vertex set.The data structure used in the algorithm is no more complex than array.  相似文献   

9.
A novel method, named critical-network-based (CNB),for timing optimization in global routing is presented in this paper.The essence of this method is different from that of the typical existing ones,named nets-based (NB) and critical-path-based (CPB).The main contribution of this paper is that the CNB delay reduction method is more efficient than the typical existing ones.This new method makes it possible to reduce the delay in an overall survey.Based on CNB,a timing optimization algorithm for global routing is implemented and tested on Microelectronics Center of North Carolina (MCNC) benchmarks in this paper.The experimental results axe compared between this algorithm and the existing ones.The experimental results show that this algorithm is able to control the delay efficiently.  相似文献   

10.
A Reduction Algorithm Meeting Users Requirements   总被引:9,自引:0,他引:9       下载免费PDF全文
Generally a database encompasses various kinds of knowledge and is shared by many users.Different users may prefer different kinds of knowledge.So it is important for a data mining algorithm to output specific knowledge according to users‘ current requirements (preference).We call this kind of data mining requirement-oriented knowledge discovery (ROKD).When the rough set theory is used in data mining,the ROKD problem is how to find a reduct and corresponding rules interesting for the user.Since reducts and rules are generated in the same way,this paper only concerns with how to find a particular reduct.The user‘s requirement is described by an order of attributes,called attribute order,which implies the importance of attributes for the user.In the order,more important attributes are located before less important ones.then the problem becomes how to find a reduct including those attributes anterior in the attribute order.An approach to dealing with such a problem is proposed.And its completeness for reduct is proved.After that,three kinds of attribute order are developed to describe various user requirements.  相似文献   

11.
赵冬梅  李红 《计算机应用》2017,37(4):1008-1013
网络安全态势要素选取的质量对网络安全态势评估的准确性起到至关重要的作用,而现有的网络安全态势要素提取方法大多依赖先验知识,并不适用于处理网络安全态势数据。为提高网络安全态势要素提取的质量与效率,提出一种基于属性重要度矩阵的并行约简算法,在经典粗糙集基础上引入并行约简思想,在保证分类不受影响的情况下,将单个决策信息表扩展到多个,利用条件熵计算属性重要度,根据约简规则删除冗余属性,从而实现网络安全态势要素的高效提取。为验证算法的高效性,利用Weka软件对数据进行分类预测,在NSL-KDD数据集中,相比利用全部属性,通过该算法约简后的属性进行分类建模的时间缩短了16.6%;对比评价指标发现,相比现有的三种态势要素提取算法(遗传算法(GA)、贪心式搜索算法(GSA)和基于条件熵的属性约简(ARCE)算法),该算法具有较高的召回率和较低的误警率。实验结果表明,经过该算法约简的数据具有更好的分类性能,实现了网络安全态势要素的高效提取。  相似文献   

12.
提出了一种处理海量的不完备决策表的方法。将基于互信息的属性重要度作为启发式信息,利用遗传算法对不完备的原始决策表中的条件属性进行约简,形成包含missing值的决策表,称为优化决策表。利用原始决策表自身的信息,通过属性扩展,从优化决策表中抽取一致性决策规则,而无须计算missing值。该方法在UCI的8个数据集上的实验结果优于EMAV方法,是一种有效的从海量不完备决策表中抽取规则的方法。  相似文献   

13.
属性约简的目的在于减少条件属性中不必要属性的数目,是知识发现中的关键问题之一。本文提出了一种改进的基于Rough集的启发式算法(IMSA),定义了新的启发函数(WSH)。这个函数考虑了所有隐藏规则集的质量,并且考虑了相关规则集的权重。在算法本身的时间复杂度没有增加的前提下,能够解决MSA算法遇到多个相同MSH值时无法处理的情况。实验分析表明,该算法是有效的。  相似文献   

14.
针对决策树C4.5算法在处理连续值属性过程中时间复杂度较高的问题,提出一种新的决策树构建方法:采用概率论中属性间的相关系数(Pearson),对数据集中的属性进行约简;结合属性的信息增益率,保留决策属性的最优子集,保证属性子集中没有冗余属性;采用边界点的判定,改进了连续值属性离散化过程中阈值分割方法,对信息增益率的计算进行修正。采用UCI数据库中的数据集,在Pycharm平台上进行一系列对比实验,结果表明:采用改进后C4.5决策树算法,决策树生成效率提高了约50%,准确率提升约2%,比较有效地解决了原C4.5算法属性选择偏连续值属性的问题。  相似文献   

15.
传统的粗糙集理论主要是针对单层次决策表的属性约简和决策规则获取研究.然而,现实中树型结构的属性值分类是普遍存在的.针对条件属性具有属性值分类的情况,结合全子树泛化模式,提出一种多层次粗糙集模型,分析决策表在不同层次泛化空间下相关性质.结合基于正区域的属性约简理论,提出属性值泛化约简概念讨论二者之间的关系,同时证明求解泛化约简是一个NP Hard问题.为此,提出一种基于正区域的的启发式泛化约简算法,该算法采用自顶向下逐步细化搜索策略,能够在保持原始决策表正区域不改变的前提下,将决策表所有属性值泛化到最佳层次.理论分析和仿真实验表明,泛化约简方法能提高知识发现的层次和泛化能力.  相似文献   

16.
决策表属性约简是粗糙集理论中的重要问题,经典决策表属性约简方法从保持论域划分能力的角度出发,选择最优条件属性约简集.从决策属性与条件属性的相关性角度出发,将决策表属性约简思想与传统统计学中的对应分析方法相结合,提出了一种量化决策属性与条件属性之间依赖关系的度量,称为投影区分度,并基于此发展了一种决策表属性约简算法.最后用简单实例说明了该方法的正确性.  相似文献   

17.
一种基于差别矩阵的启发式属性约简算法   总被引:2,自引:0,他引:2       下载免费PDF全文
为了获得决策系统中更好的相对属性约简,本文提出了一种基于差别矩阵的启发式属性约简算法。该算法以求差别矩阵为基础,不仅考虑了所选择条件属性与决策属性的互信 息,还考虑了其取值的分布情况,从信息论角度定义了一种新的属性重要性度量方法,将其作为启发式信息,最终求得属性约简集。实例表明,算法能够有效地对决策系统进进行约简,获得比较理想的约简结果,同时约简后的决策规则数目较少。  相似文献   

18.
一种基于Rough集理论的属性约简启发式算法   总被引:9,自引:1,他引:9  
属性约简是知识发现中的关键问题之一.为了能够有效地获取决策表中属性的最小相对约简,在Rough集理论的基础上构造了一个新的算子,将信息论角度定义的属性的重要性作为启发式信息,来描述在决策表中条件属性所提供的知识对决策属性的影响;并采用宽度优先搜索策略,提出了一种新的属性约简启发式算法.以原始条件属性集为起点并结合算子,通过向属性核的递减式逼近,得到属性的最小相对约简.实例分析表明,该算法能有效地对决策表属性进行约简.  相似文献   

19.
数据挖掘是一种重要的数据分析方法,决策树是数据挖掘中的一种主要技术,如何构造出最优决策树是许多研究者关心的问题。本文通过Rough集方法对决策表进行属性约简和属性值约简,去除决策表中与决策无关的冗余信息。在简化的决策表基础上构造近似最优决策树,本文给出了近似最优决策树的生成算法,并通过实例说明。  相似文献   

20.
Fuzzy rough set is a generalization of crisp rough set to deal with data sets with real value attributes. A primary use of fuzzy rough set theory is to perform attribute reduction for decision systems with numerical conditional attribute values and crisp (symbolic) decision attributes. In this paper we define inconsistent fuzzy decision system and their reductions, and develop discernibility matrix-based algorithms to find reducts. Finally, two heuristic algorithms are developed and comparison study is provided with the existing algorithms of attribute reduction with fuzzy rough sets. The proposed method in this paper can deal with decision systems with numerical conditional attribute values and fuzzy decision attributes rather than crisp ones. Experimental results imply that our algorithm of attribute reduction with general fuzzy rough sets is feasible and valid.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号