首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 336 毫秒
1.
一种多变量决策树的构造与研究   总被引:3,自引:0,他引:3       下载免费PDF全文
单变量决策树算法造成树的规模庞大、规则复杂、不易理解,而多变量决策树是一种有效用于分类的数据挖掘方法,构造的关键是根据属性之间的相关性选择合适的属性组合构成一个新的属性作为节点。结合粗糙集原理中的知识依赖性度量和信息系统中条件属性集的离散度概念,提出了一种多变量决策树的构造算法(RD)。在UCI上部分数据集的实验结果表明,提出的多变量决策树算法的分类效果与传统的ID3算法以及基于核方法的多变量决策树的分类效果相比,有一定的提高。  相似文献   

2.
针对传统数据分类属性模型中存在的较为复杂且大数据分类效率低等问题,基于云计算环境,提出利用深度属性加权贝叶斯(deep attribute weighting Bayesian,AWB)算法结合改进差别信息树(differential information tree,DIT)的大数据高效分类方法.利用AWB算法构建大数据训练集的模糊知识库,提高大数据分类精度;采用改进DIT进行模糊粗糙集属性约简,以并行方式利用映射函数对信息进行分区,将洗牌算法融入模糊分类器的设计中,提高大数据分类效率;利用CloudSim仿真器在大型网络数据集对所提方法的性能进行实验论证.实验结果表明,所提方法提高了分类准确度,降低了计算时间,提高了计算效率.  相似文献   

3.
为优化针对非均衡数据的分类效果,结合犹豫模糊集理论与决策树算法,提出一种改进的模糊决策树算法。通过SMOTE算法对非均衡数据进行过采样处理,使用K-means聚类方法获得各属性的聚类中心点,利用2种不同的隶属度函数对数据集进行模糊化处理。在此基础上,根据隶属度函数和犹豫模糊集的信息能量求得各属性的犹豫模糊信息增益,选取最大值替代Fuzzy ID3算法中的模糊信息增益作为属性的分裂准则,构建一个用于非均衡数据分类的犹豫模糊决策树模型。实验结果表明,基于犹豫模糊决策树的分类器在AUC评价指标上相对于C4.5、KNN、随机森林等传统分类算法平均提高了12.6%。  相似文献   

4.
一种多变量决策树方法研究   总被引:1,自引:1,他引:0  
单变量的决策树算法造成树的规模庞大,规则复杂,不易理解.本文结合粗糙集原理中的相对核及加权粗糙度的方法,提出了一种新的多变量决策树算法.通过实例表明,本文的多变量决策树方法产生的决策树比传统的ID3算法构造的决策树更简单,具有较好的分类效果.  相似文献   

5.
曹鹏  李博  栗伟  赵大哲 《计算机应用》2013,33(2):550-553
针对大规模数据的分类准确率低且效率下降的问题,提出一种结合X-means聚类的自适应随机子空间组合分类算法。首先使用X-means聚类方法,保持原有数据结构的同时,把复杂的数据空间自动分解为多个样本子空间进行分治学习;而自适应随机子空间组合分类器,提升了基分类器的差异性并自动确定基分类器数量,提升了组合分类器的鲁棒性及分类准确性。该算法在人工和UCI数据集上进行了测试,并与传统单分类和组合分类算法进行了比较。实验结果表明,对于大规模数据集,该方法具有更好的分类精度和健壮性,并提升了整体算法的效率。  相似文献   

6.
基于决策分类熵的决策树构造算法及应用   总被引:1,自引:0,他引:1  
董广  王兴起 《计算机应用》2009,29(11):3103-3106
为了更好地完成金融数据集上的分类挖掘任务,以粗糙集理论为基础提出决策分类熵的概念,进而以属性的决策分类熵为属性分裂度量提出基于决策分类熵的决策树构造算法,并针对过拟合问题提出一种抑制参数来实现树规模的良好控制。实例分析及金融数据集上的实验表明:相比经典的C4.5决策树算法,新算法能够较好地克服其缺点和不足,构建更优的决策树,能够更好地完成分类任务。  相似文献   

7.
针对C4.5决策树算法在处理多维数据分类时,没有考虑各属性对分类结果的影响,导致分类准确率低的问题,提出一种基于距离权值的C4.5组合决策树算法。根据标准欧式距离定义数据属性的距离权值,更新C4.5决策树算法的信息增益率,得到基于距离权值的C4.5算法。利用改进后的C4.5决策树分类算法训练多个基分类器,基分类器通过Bagging集成方法构建组合决策树。实验结果表明,该算法在处理多维数据时有较高的准确性和稳定性。  相似文献   

8.
向东  赵勇  陈阳 《计算机科学》2012,39(3):192-195
基于流形正则化框架提出一种分类算法(MLD-RLSC),以解决高维文档分类问题。该算法通过构建训练样本的最近邻图来估计数据空间的几何结构并将其作为流形正则化项,结合多变量线性回归获得高维文档的低维流形结构,并采用k近邻分类器对低维流形进行分类,得到针对多类问题的分类器。该算法能够充分利用训练样本的类别信息来帮助学习以提取有效特征。通过在Reuters-21578数据集上的实验,证明该算法的分类性能和运行速度比传统分类器有较大的提高。  相似文献   

9.
ID3算法是一种信息熵的决策树学习算法,把信息熵作为选择测试属性的标准,对训练实例集进行分类并构造决策树来预测如何由属性对整个实例空间进行划分。ID3算法对于相对小的数据集是很有效的,但对大型数据库而言,ID3算法无法处理。SLIQ分类算法使用了一些独特的技术,改进了学习的时间,同时在没有降低精确度的情况下,解决了对磁盘驻留大数据集的分类。具有更快的速度而且生成较小的树。  相似文献   

10.
马宗杰  刘华文 《计算机应用》2014,34(7):2058-2060
针对多标签数据的标签相关性和高维问题,提出一种基于奇异值分解-偏最小二乘回归的多标签分类算法,该算法可以对多标签数据进行维数约简和回归分析。首先,将类别标签集合作为整体处理,对标签相关性进行考察; 其次,利用奇异值分解(SVD)技术得到样本和标签空间的得分向量,实施降维; 最后,在偏最小二乘回归(PLSR)的基础上构建多标签分类模型。实验结果表明,在四种维数较高的真实数据集上,该算法可以获得有效的分类结果。  相似文献   

11.
Univariate decision trees are classifiers currently used in many data mining applications. This classifier discovers partitions in the input space via hyperplanes that are orthogonal to the axes of attributes, producing a model that can be understood by human experts. One disadvantage of univariate decision trees is that they produce complex and inaccurate models when decision boundaries are not orthogonal to axes. In this paper we introduce the Fisher’s Tree, it is a classifier that takes advantage of dimensionality reduction of Fisher’s linear discriminant and uses the decomposition strategy of decision trees, to come up with an oblique decision tree. Our proposal generates an artificial attribute that is used to split the data in a recursive way.The Fisher’s decision tree induces oblique trees whose accuracy, size, number of leaves and training time are competitive with respect to other decision trees reported in the literature. We use more than ten public available data sets to demonstrate the effectiveness of our method.  相似文献   

12.
Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.  相似文献   

13.
阐明决策树分类器在用于分类的数据挖掘技术中依然重要,论述基于决策树归纳分类的ID3、C4.5算法,并且对决策属性的选取法则进行说明。通过实例解析ID3、C4.5算法实现过程,结果表明C4.5算法相比较于ID3算法的优越性.尤其在处理具有多属性值的数据时的更加合理和正确。  相似文献   

14.
The optic nerve disease is an important disease that appears commonly in public. In this paper, we propose a hybrid diagnostic system based on discretization (quantization) method and classification algorithms including C4.5 decision tree classifier, artificial neural network (ANN), and least square support vector machine (LSSVM) to diagnose the optic nerve disease from Visual Evoked Potential (VEP) signals with discrete values. The aim of this paper is to investigate the effect of Discretization method on the classification of optic nerve disease. Since the VEP signals are non-linearly-separable, low classification accuracy can be obtained by classifier algorithms. In order to overcome this problem, we have used the Discretization method as data pre-processing. The proposed method consists of two phases: (i) quantization of VEP signals using Discretization method, and (ii) diagnosis of discretized VEP signals using classification algorithms including C4.5 decision tree classifier, ANN, and LSSVM. The classification accuracies obtained by these hybrid methods (combination of C4.5 decision tree classifier-quantization method, combination of ANN-quantization method, and combination of LSSVM-quantization method) with and without quantization strategy are 84.6-96.92%, 94.20-96.76%, and 73.44-100%, respectively. As can be seen from these results, the best model used to classify the optic nerve disease from VEP signals is obtained for the combination of LSSVM classifier and quantization strategy. The obtained results denote that the proposed method can make an effective interpretation and point out the ability of design of a new intelligent assistance diagnosis system.  相似文献   

15.

Learning from patient records may aid medical knowledge acquisition and decision making. Decision tree induction, based on ID3, is a well-known approach of learning from examples. In this article we introduce a new data representation formalism that extends the original ID3 algorithm. We propose a new algorithm, ID+, which adopts this representation scheme. ID+ provides the capability of modeling dependencies between attributes or attribute values and of handling multiple values per attribute. We demonstrate our work via a series of medical knowledge acquisition experiments that are based on a ''real-world'' application of acute abdominal pain in children. In the context of these experiments, we compare ID+ with C4.5, NewId, and a Naive Bayesian classifier. Results demonstrate that the rules acquired via ID+ improve decision tree clinical comprehensibility and complement explanations supported by the Naive Bayesian classifier, while in terms of classification, accuracy decrease is marginal.  相似文献   

16.
面向分布式数据流大数据分类的多变量决策树   总被引:1,自引:0,他引:1  
张宇  包研科  邵良杉  刘威 《自动化学报》2018,44(6):1115-1127
分布式数据流大数据中的类别边界不规则且易变,因此基于单变量决策树的集成分类器需要较大数量的基分类器才能准确地近似表达类别边界,这将降低集成分类器的学习与分类性能.因而,本文提出了基于几何轮廓相似度的多变量决策树.在最优基准向量的引导下将n维空间样本点投影到一维空间以建立有序投影点集合,然后通过类别投影边界将有序投影点集合划分为多个子集,接着分别对不同类别集合的交集递归投影分裂,最终生成决策树.实验表明,本文提出的多变量决策树GODT具有很高的分类精度和较低的训练时间,有效结合了单变量决策树学习效率高与多变量决策树表示能力强的优点.  相似文献   

17.
基于属性相关性的决策树规则生成算法   总被引:5,自引:0,他引:5  
范洁  常晓航  杨岳湘 《计算机仿真》2006,23(12):90-92,103
决策树方法因结构简单、便于理解和具有较高的分类精度而在数据挖掘中被广泛采用,其规则生成算法实现对决策树规则的提取和化简。属性相关性分析的基本思想是计算某种度量,用于量化属性与给定概念的相关性。提出了一种基于属性相关性的c4.5决策树规则生成算法c—c4.5rules,可替代c4.5原有的规则生成算法。c—c4.5rules在对规则进行化简时充分考虑了属性之间的关联性,实验表明该算法在保持原有分类精度的前提下,能有效提高规则生成时的计算速度和效率。  相似文献   

18.
This paper explores the potential of an artificial immune‐based supervised classification algorithm for land‐cover classification. This classifier is inspired by the human immune system and possesses properties similar to nonlinear classification, self/non‐self identification, and negative selection. Landsat ETM+ data of an area lying in Eastern England near the town of Littleport are used to study the performance of the artificial immune‐based classifier. A univariate decision tree and maximum likelihood classifier were used to compare its performance in terms of classification accuracy and computational cost. Results suggest that the artificial immune‐based classifier works well in comparison with the maximum likelihood and the decision‐tree classifiers in terms of classification accuracy. The computational cost using artificial immune based classifier is more than the decision tree but less than the maximum likelihood classifier. Another data set from an area in Spain is also used to compare the performance of immune based supervised classifier with maximum likelihood and decision‐tree classification algorithms. Results suggest an improved performance with the immune‐based classifier in terms of classification accuracy with this data set, too. The design of an artificial immune‐based supervised classifier requires several user‐defined parameters to be set, so this work is extended to study the effect of varying the values of six parameters on classification accuracy. Finally, a comparison with a backpropagation neural network suggests that the neural network classifier provides higher classification accuracies with both data sets, but the results are not statistically significant.  相似文献   

19.
网络流量的决策树分类   总被引:2,自引:1,他引:1  
应用识别与流量分类是网络管理、安全、研究等相关事务的必要前提.随着网络的高速发展以及各种新型应用的不断涌现,基于分组传输层端口号和深度分组解析的分类技术难以满足需求.本文验证网络流量的统计特性可以有效地区分不同应用,提出一种基于C4.5决策树分类器的有监督网络流量分类方法,讨论boosting增强方法和特征选择两种改进.实验结果表明,C4.5分类器的训练复杂度适中,准确率高且分类速度快;增强方法可以进一步提高分类器的准确率,代价是训练时间大幅提高和分类时间稍微减慢;特征选择算法则提高分类速度而稍微降低准确率.  相似文献   

20.
Abstract: The aim of this research was to compare classifier algorithms including the C4.5 decision tree classifier, the least squares support vector machine (LS-SVM) and the artificial immune recognition system (AIRS) for diagnosing macular and optic nerve diseases from pattern electroretinography signals. The pattern electroretinography signals were obtained by electrophysiological testing devices from 106 subjects who were optic nerve and macular disease subjects. In order to show the test performance of the classifier algorithms, the classification accuracy, receiver operating characteristic curves, sensitivity and specificity values, confusion matrix and 10-fold cross-validation have been used. The classification results obtained are 85.9%, 100% and 81.82% for the C4.5 decision tree classifier, the LS-SVM classifier and the AIRS classifier respectively using 10-fold cross-validation. It is shown that the LS-SVM classifier is a robust and effective classifier system for the determination of macular and optic nerve diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号