首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 38 毫秒
1.
Inductive learning systems can be effectively used to acquire classification knowledge from examples. Many existing symbolic learning algorithms can be applied in domains with continuous attributes when integrated with a discretization algorithm to transform the continuous attributes into ordered discrete ones. In this paper, a new information theoretic discretization method optimized for supervised learning is proposed and described. This approach seeks to maximize the mutual dependence as measured by the interdependence redundancy between the discrete intervals and the class labels, and can automatically determine the most preferred number of intervals for an inductive learning application. The method has been tested in a number of inductive learning examples to show that the class-dependent discretizer can significantly improve the classification performance of many existing learning algorithms in domains containing numeric attributes  相似文献   

2.
Fuzzy rough set is a generalization of crisp rough set to deal with data sets with real value attributes. A primary use of fuzzy rough set theory is to perform attribute reduction for decision systems with numerical conditional attribute values and crisp (symbolic) decision attributes. In this paper we define inconsistent fuzzy decision system and their reductions, and develop discernibility matrix-based algorithms to find reducts. Finally, two heuristic algorithms are developed and comparison study is provided with the existing algorithms of attribute reduction with fuzzy rough sets. The proposed method in this paper can deal with decision systems with numerical conditional attribute values and fuzzy decision attributes rather than crisp ones. Experimental results imply that our algorithm of attribute reduction with general fuzzy rough sets is feasible and valid.  相似文献   

3.
Data mining is frequently applied to data sets with missing attribute values. A new approach to missing attribute values, called closest fit, is introduced in this paper. In this approach, for a given case (example) with a missing attribute value we search for another case that is as similar as possible to the given case. Cases can be considered as vectors of attribute values. The search is for the case that has as many as possible identical attribute values for symbolic attributes, or as the smallest possible value differences for numerical attributes. There are two possible ways to conduct a search: within the same class (concept) as the case with the missing attribute values, or for the entire set of all cases. For comparison, we also experimented with another approach to missing attribute values, where the missing values are replaced by the most common value of the attribute for symbolic attributes or by the average value for numerical attributes. All algorithms were implemented in the system OOMIS. Our experiments were performed on the preterm birth data sets provided by the Duke University Medical Center.  相似文献   

4.
We present a method to learn maximal generalized decision rules from databases by integrating discretization, generalization and rough set feature selection. Our method reduces the data horizontally and vertically. In the first phase, discretization and generalization are integrated and the numeric attributes are discretized into a few intervals. The primitive values of symbolic attributes are replaced by high level concepts and some obvious superfluous or irrelevant symbolic attributes are also eliminated. Horizontal reduction is accomplished by merging identical tuples after the substitution of an attribute value by its higher level value in a pre-defined concept hierarchy for symbolic attributes, or the discretization of continuous (or numeric) attributes. This phase greatly decreases the number of tuples in the database. In the second phase, a novel context-sensitive feature merit measure is used to rank the features, a subset of relevant attributes is chosen based on rough set theory and the merit values of the features. A reduced table is obtained by removing those attributes which are not in the relevant attributes subset and the data set is further reduced vertically without destroying the interdependence relationships between classes and the attributes. Then rough set-based value reduction is further performed on the reduced table and all redundant condition values are dropped. Finally, tuples in the reduced table are transformed into a set of maximal generalized decision rules. The experimental results on UCI data sets and a real market database demonstrate that our method can dramatically reduce the feature space and improve learning accuracy.  相似文献   

5.
针对决策树C4.5算法在处理连续值属性过程中时间复杂度较高的问题,提出一种新的决策树构建方法:采用概率论中属性间的相关系数(Pearson),对数据集中的属性进行约简;结合属性的信息增益率,保留决策属性的最优子集,保证属性子集中没有冗余属性;采用边界点的判定,改进了连续值属性离散化过程中阈值分割方法,对信息增益率的计算进行修正。采用UCI数据库中的数据集,在Pycharm平台上进行一系列对比实验,结果表明:采用改进后C4.5决策树算法,决策树生成效率提高了约50%,准确率提升约2%,比较有效地解决了原C4.5算法属性选择偏连续值属性的问题。  相似文献   

6.
序贯三支决策方法是一种能够表示问题中的多重层次粒度,并将多粒度结合起来解决不确定决策问题的有效途径。优势-等价关系粗糙集则是针对条件属性具有偏好关系的分类问题,提取有序信息,对目标概念进行近似,从而形成决策知识。利用传统的优势关系粗糙集方法进行知识约简和提取的效率低下,而目前大部分序贯三支决策方法则局限在符号值属性的信息系统中,对连续值和有序值不能进行有效处理,造成一定程度的信息丢失。因此,将序贯三支决策的思想应用于优势关系粗糙集模型中,定义了一种新的基于序贯三支决策的属性约简及相应的属性重要度,对具有偏好值属性的信息系统进行更加高效的处理,通过多粒度的表示和关系的研究,加速了知识约简过程。选取了多组UCI数据进行实验,结果表明所提出的基于优势关系的序贯三支决策方法能够在保证约简质量的基础上明显降低时间耗费。  相似文献   

7.
The representation of knowledge has an important effect on automated decision-making. In this paper, vector spaces are used to describe a condition space and a decision space, and knowledge is represented by a mapping from the condition space to the decision space. Many such mappings can be obtained from a training set. A set of mappings, which are created from multiple reducts in the training set, is defined as multiknowledge. In order to get a good reduct and find multiple reducts, the WADF (worst-attribute-drop-first) algorithm is developed through analysis of the properties of decision systems using rough set theory. An approach that combines multiknowledge and the naïve Bayes classifier is applied to make decisions for unseen instances or for instances with missing attribute values. Benchmark data sets from the UCI Machine Learning Repository are used to test the algorithms. The experimental results are encouraging; the prediction accuracy for unseen instances by using the algorithms is higher than by using other approaches based on a single body of knowledge.  相似文献   

8.
在已有的多种决策树测试属性选择方法中,未见将属性值遗漏数据处理集成在测试属性选择过程中的报道, 而现有的属性值遗漏数据处理方法都会不同程度地带入偏置。基于此,提出了一种将基于联合墒的信息增益率作为 决策树测试属性选择标准的方法,用以在生成决策树的过程中消除值遗漏数据对测试属性选择的影响。在WEKA机 器平台上进行了对比实验,结果表明,改进算法能够从总体上提高算法的执行效率和分类精度。  相似文献   

9.
区间Pythagorean犹豫模糊集,可以更加全面完整地描述决策者给出的决策结果,因此它是一个表示不确定现象的强有力的工具。针对模糊信息下的决策问题,提出了一种基于区间Pythagorean犹豫模糊连续熵的多属性决策方法。提出连续区间Pythagorean犹豫模糊有序加权平均(CIPHFOWA)算子,并提出了区间Pythagorean犹豫模糊连续熵,同时给出了属性权重完全未知和部分已知时的权重确定方法;提出了一种基于区间Pythagorean犹豫模糊连续熵-灰色关联度的多属性决策方法,并通过新型农村医疗制度完善情况评价案例说明该方法的可行性和有效性。  相似文献   

10.
经典属性约简及其延伸算法是基于有决策属性的信息系统的属性约简算法,它们对无决策属性的信息系统的属性约简无能为力.为此,本文以粗集理论为基础,对无决策属性的信息系统从集合论的论域划分方面进行研究,提出了一种适用于无决策属性的信息系统的启发式属性约简算法.该算法在一定程度上能够解决无决策属性的信息系统属性约简问题,进一步扩展了粗集理论的应用范围.实例表明该算法是有效可行的.  相似文献   

11.
现有的混合信息系统知识发现模型涵盖的数据类型大多为符号型、数值型条件属性及符号型决策属性,且大多数模型的关注点是属性约简或特征选择,针对规则提取的研究相对较少。针对涵盖更多数据类型的混合信息系统构建一个动态规则提取模型。首先修正了现有的属性值距离的计算公式,对错层型属性值的距离给出了一种定义形式,从而定义了一个新的混合距离。其次提出了针对数值型决策属性诱导决策类的3种方法。其后构造了广义邻域粗糙集模型,提出了动态粒度下的上下近似及规则提取算法,构建了基于邻域粒化的动态规则提取模型。该模型可用于具有以下特点的信息系统的规则提取: (1)条件属性集可包括单层符号型、错层符号型、数值型、区间型、集值型、未知型等; (2)决策属性集可包括符号型、数值型。利用UCI数据库中的数据集进行了对比实验,分类精度表明了规则提取算法的有效性。  相似文献   

12.
基于离散度的决策树构造方法   总被引:1,自引:0,他引:1  
在构造决策树的过程中,属性选择将影响到决策树的分类精度.对此,讨论了基于信息熵方法和WMR方法的局限性,提出了信息系统中条件属性集的离散度的概念.利用该概念在决策树构造过程中选择划分属性,设计了基于离散度的决策树构造算法DSD.DSD算法可以解决WMR方法在实际应用中的局限性.在UCI数据集上的实验表明,该方法构造的决策树精度与基于信息熵的方法相近,而时间复杂度则优于基于信息熵的方法.  相似文献   

13.
针对基于正域的属性约简算法在约简过程中存在重复计算属性相对重要度从而导致算法效率低的问题,从属性度量和搜索策略的角度提出基于知识粗糙熵的快速属性约简算法。首先,在决策信息系统中通过引入知识距离提出知识粗糙熵以度量知识的粗糙程度;其次,利用知识粗糙熵作为属性显著度的评价标准来评估单个属性的重要程度;最后,利用属性重要度对所有条件属性进行排序,且通过属性依赖度删除冗余属性,从而实现快速约简。在六个公开数据集上将所提算法与其他三种算法在运行效率和分类精度上进行对比实验。结果表明,该算法的运行效率比其他三种算法分别提高了83.24%、28.77%和59.92%;在三种分类器中,分类精度分别平均提高了0.83%、0.63%和1.37%。因此,所提算法在保证分类性能的同时,能以更快的速度获得约简。  相似文献   

14.
This paper proposes an intuitionistic fuzzy decision method based on prospect theory and the evidential reasoning approach, aiming at analyzing multi-attribute decision making problems in which the criteria values are intuitionistic fuzzy numbers and the information of attributes weights is unknown. Firstly, the measures of entropy and cross entropy are defined for intuitionistic fuzzy sets by taking into consideration the preference of decision maker towards hesitancy degree. Secondly, combined with bounded rationality, the prospect decision matrix is calculated in the light of prospect theory and intuitionistic fuzzy distance. Thirdly, the correlational analyses are conducted between the attribute weights and three indicators which are entropy, cross entropy and prospect value, and optimization models for identifying attribute weights are built under the circumstances that the weights are incomplete and unknown. Finally, in order to avoid the loss of decision making information, the evidential reasoning approach is applied to the calculation of comprehensive prospective values for all alternatives. Following the value calculation, the ranking and the optimal alternative are determined based on the comprehensive prospective values. Illustrating examples demonstrate that the proposed method is reasonable and feasible.  相似文献   

15.
一种连续条件属性值的决策表的归纳学习方法   总被引:1,自引:0,他引:1  
对由连续条件属性值和离散决策属性值组成的决策表,提出了一种归纳学习方法。把决策表中的连续条件属性值看作一矩阵,进行矩阵的奇异值分解,以确定决策表条件属性的数目。用模糊C均值聚类的方法对连续条件属性值进行不同聚类数目的聚类,得到不同聚类数目下的离散决策表,对这些决策表进行条件属性简化,从而得到不同的条件属性数目。比较矩阵奇异值分解后决策表条件属性的数目和上述不同聚类数目下的离散决策表简化后的条件属性的数目,并考虑决策属性的数目,确定最终的聚类数目。在此基础上,给出了由连续条件属性值和离散决策属性值组成的决策表的归纳学习方法,并验证了其有效性。  相似文献   

16.

信息观下研究邻域决策系统的属性约简是一种新颖的思路. 通过分析论域下某样本邻域中其他样本与该样本决策属性值的异同, 定义不一致邻域矩阵. 在计算属性重要度时, 利用不一致邻域减少在原条件属性基础上增加一个属性后条件熵的计算时间. 分析得到邻域系统下条件熵与正域的关系, 提出一种信息观下基于不一致邻域矩阵的属性约简算法, 并分析该算法与其他算法的内在联系. 实验结果验证了所提出算法的有效性.

  相似文献   

17.
基于信息熵的二元分割算法离散连续属性,在对连续属性较多,数据量较大的数据集进行分析预测中,存在不足。实验表明,在决策树算法中结合改进后的k-means算法作为连续属性离散化算法,在连续属性较多的数据实例中可以构造出更好的决策树。  相似文献   

18.
针对粗糙集数据分析中的不确定性度量问题。本文首先构造一种新型的考虑条件属性缺失度的目标概念条件熵和决策知识条件熵。在此基础上,提出基于条件熵的属性权重确定技术和最小条件熵非完备属性取值补充方法,用以解决属性权重完全未知的非完备多属性决策问题。应用实例分析表明:该方法能有效结合粗粒度的初步分级信息,客观地确定决策因素取值,具有很强的解释意义,得到的决策结果更为合理有效。  相似文献   

19.
洪菁  陈强  刘惠彬 《微机发展》2006,16(10):32-34
对传统的粗糙集理论进行了扩展,提出了一种改进的粗糙集归纳学习方法。一方面,针对连续属性离散化,利用模糊集理论对连续属性进行模糊化,再根据模糊贴近度构造模糊相似矩阵,并用k-w方法粗略评估各连续属性的重要度,建立基于模糊相似关系的划分,最终生成相容的决策表。另一方面,针对解决最优属性的选择问题,提出一种加权求和的属性重要度定义。基于以上模型开发了一个原型系统,并以一个工程实例验证了此方法的有效性。  相似文献   

20.
针对决策信息为Pythagorean犹豫模糊数的多属性群决策问题,提出一种基于Pythagorean犹豫模糊交叉熵的多属性群决策方法。引入Pythagorean犹豫模糊交叉熵的概念。以Pythagorean犹豫模糊交叉熵作为决策信息差异程度的度量,提出专家权重和属性权重的确定模型。提出一种基于Pythagorean犹豫模糊熵的TOPSIS方法,并通过光伏电站选址案例说明了该方法的可行性和有效性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号