首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 376 毫秒
1.
The availability of a large amount of medical data leads to the need of intelligent disease prediction and analysis tools to extract hidden information. A large number of data mining and statistical analysis tools are used for disease prediction. Single data‐mining techniques show acceptable level of accuracy for heart disease diagnosis. This article focuses on prediction and analysis of heart disease using weighted vote‐based classifier ensemble technique. The proposed ensemble model overcomes the limitations of conventional data‐mining techniques by employing the ensemble of five heterogeneous classifiers: naive Bayes, decision tree based on Gini index, decision tree based on information gain, instance‐based learner, and support vector machines. We have used five benchmark heart disease data sets taken from UCI repository. Each data set contains different set of feature space that ultimately leads to the prediction of heart disease. The effectiveness of proposed ensemble classifier is investigated by comparing the performance with different researchers' techniques. Tenfold cross‐validation is used to handle the class imbalance problem. Moreover, confusion matrices and analysis of variance statistics are used to show the prediction results of all classifiers. The experimental results verify that the proposed ensemble classifier can deal with all types of attributes and it has achieved the high diagnosis accuracy of 87.37%, sensitivity of 93.75%, specificity of 92.86%, and F‐measure of 82.17%. The F‐ratio higher than the F‐critical and p‐value less than 0.01 for a 95% confidence interval indicate that the results are statistically significant for all the data sets.  相似文献   

2.
This paper presents cluster‐based ensemble classifier – an approach toward generating ensemble of classifiers using multiple clusters within classified data. Clustering is incorporated to partition data set into multiple clusters of highly correlated data that are difficult to separate otherwise and different base classifiers are used to learn class boundaries within the clusters. As the different base classifiers engage on different difficult‐to‐classify subsets of the data, the learning of the base classifiers is more focussed and accurate. A selection rather than fusion approach achieves the final verdict on patterns of unknown classes. The impact of clustering on the learning parameters and accuracy of a number of learning algorithms including neural network, support vector machine, decision tree and k‐NN classifier is investigated. A number of benchmark data sets from the UCI machine learning repository were used to evaluate the cluster‐based ensemble classifier and the experimental results demonstrate its superiority over bagging and boosting.  相似文献   

3.
Clustering algorithms can be optimized using nature‐inspired techniques. Many algorithms inspired by nature, namely, firefly algorithm, ant colony optimization algorithm, and so forth, have improved clustering results. k‐means is a popular clustering technique but has limitations of local optima, which have been overcome using its various hybrids. k‐means++ is a hybrid k‐means clustering algorithm that gives the procedure to initialize centre of the clusters. In the proposed work, hybrids of nature‐inspired techniques using cuckoo and krill herd algorithm are implemented on k‐means++ algorithm to enhance cluster quality and generate optimized clusters. The designed algorithms are implemented, and the results are compared with their counterparts. Performance parameters such as accuracy, f‐measure, error rate, standard deviation, CPU time, cluster quality check, and so forth are used to measure the clustering capabilities of these algorithms. The results indicate the high performance of newly designed algorithms.  相似文献   

4.
In this research, a hybrid model is developed by integrating a case-based data clustering method and a fuzzy decision tree for medical data classification. Two datasets from UCI Machine Learning Repository, i.e., liver disorders dataset and Breast Cancer Wisconsin (Diagnosis), are employed for benchmark test. Initially a case-based clustering method is applied to preprocess the dataset thus a more homogeneous data within each cluster will be attainted. A fuzzy decision tree is then applied to the data in each cluster and genetic algorithms (GAs) are further applied to construct a decision-making system based on the selected features and diseases identified. Finally, a set of fuzzy decision rules is generated for each cluster. As a result, the FDT model can accurately react to the test data by the inductions derived from the case-based fuzzy decision tree. The average forecasting accuracy for breast cancer of CBFDT model is 98.4% and for liver disorders is 81.6%. The accuracy of the hybrid model is the highest among those models compared. The hybrid model can produce accurate but also comprehensible decision rules that could potentially help medical doctors to extract effective conclusions in medical diagnosis.  相似文献   

5.
为提高管道状况异常检测的识别率和实时性,提出基于禁忌搜索的半监督K-means聚类和C4。5决策树的集成检测方法。在禁忌搜索中引入代价敏感函数,选择具有最佳分类性能的特征组合和最佳组合权值,提高了不平衡数据分布中少数类的识别率。半监督K-means方法首先把样本特征聚类为k类,再利用C4。5方法精确每一类的边界,级联式集成方法缓解不平衡数据分布问题,提高管道检测的准确度。并提出3种集成原则:加权叠加、最近一致和最邻近原则。实验结果验证了算法的有效性,在管道状况的异常检测中具有较高的分类准确度。  相似文献   

6.
NeC4.5: neural ensemble based C4.5   总被引:5,自引:0,他引:5  
Decision tree is with good comprehensibility while neural network ensemble is with strong generalization ability. These merits are integrated into a novel decision tree algorithm NeC4.5. This algorithm trains a neural network ensemble at first. Then, the trained ensemble is employed to generate a new training set through replacing the desired class labels of the original training examples with those output from the trained ensemble. Some extra training examples are also generated from the trained ensemble and added to the new training set. Finally, a C4.5 decision tree is grown from the new training set. Since its learning results are decision trees, the comprehensibility of NeC4.5 is better than that of neural network ensemble. Moreover, experiments show that the generalization ability of NeC4.5 decision trees can be better than that of C4.5 decision trees.  相似文献   

7.
Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.  相似文献   

8.
The acquisition of data through remote sensing has become of great importance in precision agriculture, as it covers large geographical areas faster and cheaper than ground inspections. The challenge is to develop technical solutions that can benefit from both huge amounts of raw data extracted from satellite images, but also from the robust amount of knowledge refined during centuries of agricultural practice. Aiming to accurately classify crops from satellite images, we developed a hybrid intelligent system that can exploit both agricultural expert knowledge and machine learning algorithms. As the crop raw data is characterized by heterogeneity, we drive our attention to ensemble learners, while expert knowledge is encapsulated within a rule-based system. Vote-based methods for solving conflicts between ensemble’s base learners have difficulties in classifying exceptional cases correctly and also to give the rationale behind their decision. The conceptual research question is on conflict resolution in ensemble learning. To deal with debatable cases in ensemble learning and to increase transparency in such debatable decisions, our hypothesis is that argumentation could be more effective than voting-based methods. The main contribution is that voting system in ensemble learning is substituted by an argumentation-base conflict resolutor. Prospective decisions of base classifiers are presented to an argumentative system based on defeasible logic that performs dialectical reasoning on pros and cons against a classification decision. The system computes a recommendation considering both the rules extracted from base learners and the available expert knowledge. The investigated case study deals with crop classification into four classes: corn, soybean, cotton, and rice. The test site used for the experiment is an area of 20 square kilometers in the New Madrid County, southeast of the Missouri State, USA. The results show that our approach increases classification accuracy compared to the voting-based method for conflict resolution in an ensemble learner comprising of three base classifiers: a decision tree, a neural network, and a support vector machine algorithm. We also argue that combining ensemble learning and argumentation fits the decision patterns of human agents, who first collect various opinions and then perform dialectical reasoning on these opinions. We think that the people who can benefit from the conceptual instrumentation presented in this work are decision makers in domains characterized by high data availability, robust expert knowledge, and a need for justifying the rationale behind decisions.  相似文献   

9.
This paper presents a novel host-based combinatorial method based on k-Means clustering and ID3 decision tree learning algorithms for unsupervised classification of anomalous and normal activities in computer network ARP traffic. The k-Means clustering method is first applied to the normal training instances to partition it into k clusters using Euclidean distance similarity. An ID3 decision tree is constructed on each cluster. Anomaly scores from the k-Means clustering algorithm and decisions of the ID3 decision trees are extracted. A special algorithm is used to combine results of the two algorithms and obtain final anomaly score values. The threshold rule is applied for making the decision on the test instance normality. Experiments are performed on captured network ARP traffic. Some anomaly criteria has been defined and applied to the captured ARP traffic to generate normal training instances. Performance of the proposed approach is evaluated using five defined measures and empirically compared with the performance of individual k-Means clustering and ID3 decision tree classification algorithms and the other proposed approaches based on Markovian chains and stochastic learning automata. Experimental results show that the proposed approach has specificity and positive predictive value of as high as 96 and 98%, respectively.  相似文献   

10.
混合型学习模型HLM中的增量学习算法   总被引:4,自引:0,他引:4  
混合型学习模型HLM将概念获取算法HMCAP和神经网络算法FTART有机结合,能学习多概念和连续属性,其增量学习算法建立在二叉混合判定树结构和FTART网络的基础上,在给系统增加新的实例时,只需进行一遍增量学习调整原结构,不用重新生成判定树和神经网络,即可提高学习精度,速度快、效率高.本文主要介绍该模型中的增量学习算法.  相似文献   

11.
Hybrid models based on feature selection and machine learning techniques have significantly enhanced the accuracy of standalone models. This paper presents a feature selection‐based hybrid‐bagging algorithm (FS‐HB) for improved credit risk evaluation. The 2 feature selection methods chi‐square and principal component analysis were used for ranking and selecting the important features from the datasets. The classifiers were built on 5 training and test data partitions of the input data set. The performance of the hybrid algorithm was compared with that of the standalone classifiers: feature selection‐based classifiers and bagging. The hybrid FS‐HB algorithm performed best for qualitative dataset with less features and tree‐based unstable base classifier. Its performance on numeric data was also better than other standalone classifiers, whereas comparable to bagging with only selected features. Its performance was found better on 70:30 data partition and the type II error, which is very significant in risk evaluation was also reduced significantly. The improved performance of FS‐HB is attributed to the important features used for developing the classifier thereby reducing the complexity of the algorithm and the use of ensemble methodology, which added to the classical bias variance trade‐off and performed better than standalone classifiers.  相似文献   

12.
智能视觉系统虽然在大规模信息的特征检测、提取与匹配等处理上具备一定优势,但是在深层次认知上仍存在不确定性和脆弱性,尤其是针对视觉感知基础上的视觉认知任务,相关数理逻辑和图像处理方法并未实现质的突破,智能算法难以取代人类执行较为复杂的理解、推理、决策和学习等操作。为助力智能视觉感知和认知技术的进一步发展,本文总结了混合增强智能在视觉认知领域的应用现状,给出了混合增强视觉认知的基本架构,并对可纳入该架构下的应用领域及关键技术进行了综述。首先,在分析智能视觉感知内涵和基本范畴的基础上,融合人的视觉感知与心理认知,探讨混合增强视觉认知的定义、范畴及其深化过程,对不同的视觉信息处理阶段进行对比,进而在分析相关认知模型发展现状的基础上,构建混合增强视觉认知的基本框架。该架构不仅可依靠智能算法进行快速地检测、识别、理解等处理,最大限度地挖掘"机"的计算潜能,而且可凭借适时、适当的人工推理、预测和决策有效增强系统认知的准确性和可靠性,最大程度地发挥人的认知优势。其次,分别从混合增强的视觉监测、视觉驾驶、视觉决策以及视觉共享等4个领域探讨可纳入该架构的代表性应用及存在的问题,指出混合增强视觉认知架构是现有技术条件下能够更好地发挥计算机效能、减轻人处理信息压力的方式。最后,基于高、中、低计算机视觉处理技术体系,分析混合增强视觉认知架构中部分中高级视觉处理技术的宏观、微观关系,重点综述可视化分析、视觉增强、视觉注意、视觉理解、视觉推理、交互式学习以及认知评估等关键技术。混合增强视觉认知架构有助于突破当前视觉信息认知"弱人工智能"的瓶颈,将有力促进智能视觉系统向人机深度融合方向发展。下一步,还需在纯粹的基础创新、高效的人机交互、柔性的连接通路等方面开展更加深入的研究。  相似文献   

13.
This paper presents a novel method for differential diagnosis of erythemato-squamous disease. The proposed method is based on fuzzy weighted pre-processing, k-NN (nearest neighbor) based weighted pre-processing, and decision tree classifier. The proposed method consists of three parts. In the first part, we have used decision tree classifier to diagnosis erythemato-squamous disease. In the second part, first of all, fuzzy weighted pre-processing, which can improved by ours, is a new method and applied to inputs erythemato-squamous disease dataset. Then, the obtained weighted inputs were classified using decision tree classifier. In the third part, k-NN based weighted pre-processing, which can improved by ours, is a new method and applied to inputs erythemato-squamous disease dataset. Then, the obtained weighted inputs were classified via decision tree classifier. The employed decision tree classifier, fuzzy weighted pre-processing decision tree classifier, and k-NN based weighted pre-processing decision tree classifier have reached to 86.18, 97.57, and 99.00% classification accuracies using 20-fold cross validation, respectively.  相似文献   

14.
目前关于集成学习的泛化性能的研究已取得很大成功,但是关于集成学习的误差分析还需要进一步研究.考虑交叉验证在统计机器学习中对于模型性能评估有重要应用,为此,应用组块3×2交叉验证和k折交叉验证方法为每个样本点进行赋予权重的预测值的集成,并进行误差分析.在模拟数据和真实数据上进行实验,结果表明基于组块3×2交叉验证的集成学习预测误差小于单个学习器的预测误差,并且集成学习的方差比单个学习器方差小.与基于k折交叉验证的集成学习方法相比,基于组块3×2交叉验证的泛化误差小于基于k折交叉验证的泛化误差,说明基于组块3×2交叉验证的集成学习模型稳定性好.  相似文献   

15.
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.  相似文献   

16.
Functional Trees   总被引:1,自引:0,他引:1  
In the context of classification problems, algorithms that generate multivariate trees are able to explore multiple representation languages by using decision tests based on a combination of attributes. In the regression setting, model trees algorithms explore multiple representation languages but using linear models at leaf nodes. In this work we study the effects of using combinations of attributes at decision nodes, leaf nodes, or both nodes and leaves in regression and classification tree learning. In order to study the use of functional nodes at different places and for different types of modeling, we introduce a simple unifying framework for multivariate tree learning. This framework combines a univariate decision tree with a linear function by means of constructive induction. Decision trees derived from the framework are able to use decision nodes with multivariate tests, and leaf nodes that make predictions using linear functions. Multivariate decision nodes are built when growing the tree, while functional leaves are built when pruning the tree. We experimentally evaluate a univariate tree, a multivariate tree using linear combinations at inner and leaf nodes, and two simplified versions restricting linear combinations to inner nodes and leaves. The experimental evaluation shows that all functional trees variants exhibit similar performance, with advantages in different datasets. In this study there is a marginal advantage of the full model. These results lead us to study the role of functional leaves and nodes. We use the bias-variance decomposition of the error, cluster analysis, and learning curves as tools for analysis. We observe that in the datasets under study and for classification and regression, the use of multivariate decision nodes has more impact in the bias component of the error, while the use of multivariate decision leaves has more impact in the variance component.  相似文献   

17.
In this paper, a review on condition monitoring of induction motors is first presented. Then, an ensemble of hybrid intelligent models that is useful for condition monitoring of induction motors is proposed. The review covers two parts, i.e., (i) a total of nine commonly used condition monitoring methods of induction motors; and (ii) intelligent learning models for condition monitoring of induction motors subject to single and multiple input signals. Based on the review findings, the Motor Current Signature Analysis (MCSA) method is selected for this study owing to its online, non-invasive properties and its requirement of only single input source; therefore leading to a cost-effective condition monitoring method. A hybrid intelligent model that consists of the Fuzzy Min–Max (FMM) neural network and the Random Forest (RF) model comprising an ensemble of Classification and Regression Trees is developed. The majority voting scheme is used to combine the predictions produced by the resulting FMM–RF ensemble (or FMM–RFE) members. A benchmark problem is first deployed to evaluate the usefulness of the FMM–RFE model. Then, the model is applied to condition monitoring of induction motors using a set of real data samples. Specifically, the stator current signals of induction motors are obtained using the MCSA method. The signals are processed to produce a set of harmonic-based features for classification using the FMM–RFE model. The experimental results show good performances in both noise-free and noisy environments. More importantly, a set of explanatory rules in the form of a decision tree can be extracted from the FMM–RFE model to justify its predictions. The outcomes ascertain the effectiveness of the proposed FMM–RFE model in undertaking condition monitoring tasks, especially for induction motors, under different environments.  相似文献   

18.
Enlarging the feature space of the base tree classifiers in a decision forest by means of informative features extracted from an additional predictive model is advantageous for classification tasks. In this paper, we have empirically examined the performance of this type of decision forest with three different base tree classifier models including; (1) the full decision tree, (2) eight-node decision tree and (3) two-node decision tree (or decision stump). The hybrid decision forest with these base classifiers are trained in nine different sized resampled training sets. We have examined the performance of all these ensembles from different point of views; we have studied the bias-variance decomposition of the misclassification error of the ensembles, then we have investigated the amount of dependence and degree of uncertainty among the base classifiers of these ensembles using information theoretic measures. The experiment was designed to find out: (1) optimal training set size for each base classifier and (2) which base classifier is optimal for this kind of decision forest. In the final comparison, we have checked whether the subsampled version of the decision forest outperform the bootstrapped version. All the experiments have been conducted with 20 benchmark datasets from UCI machine learning repository. The overall results clearly point out that with careful selection of the base classifier and training sample size, the hybrid decision forest can be an efficient tool for real world classification tasks.  相似文献   

19.
We compare eleven methods for finding prototypes upon which to base the nearest prototype classifier. Four methods for prototype selection are discussed: Wilson+Hart (a condensation+error‐editing method), and three types of combinatorial search—random search, genetic algorithm, and tabu search. Seven methods for prototype extraction are discussed: unsupervised vector quantization, supervised learning vector quantization (with and without training counters), decision surface mapping, a fuzzy version of vector quantization, c‐means clustering, and bootstrap editing. These eleven methods can be usefully divided two other ways: by whether they employ pre‐ or postsupervision; and by whether the number of prototypes found is user‐defined or “automatic.” Generalization error rates of the 11 methods are estimated on two synthetic and two real data sets. Offering the usual disclaimer that these are just a limited set of experiments, we feel confident in asserting that presupervised, extraction methods offer a better chance for success to the casual user than postsupervised, selection schemes. Finally, our calculations do not suggest that methods which find the “best” number of prototypes “automatically” are superior to methods for which the user simply specifies the number of prototypes. © 2001 John Wiley & Sons, Inc.  相似文献   

20.
Prevention of drug dispensing errors is an importance topic in medical care. In this paper, we propose a risk management approach, namely Hybrid Data Mining (HDM), to prevent the problem of drug dispensing errors. An intelligent drug dispensing errors prevention system based on the proposed approach is then implemented. The proposed approach consists of two main procedures: First, the classification modeling and logistic regression approaches are used to derive decision tree and regression function from the given dispensing errors cases and drug databases. In the second procedure, similar drugs are then gathered together into clusters by combing clustering technique (PoCluster) and the extracted logistic regression function. The drugs that may cause dispensing errors will then be alerted through the clustering results and the decision tree. Through experimental evaluation on real datasets in a medical center, the proposed approach was shown to be capable of discovering the potential dispensing errors effectively. Hence, the proposed approach and implemented system serve as very useful application of data mining techniques for risk management in healthcare fields.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号