首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

2.
The degree of malignancy in brain glioma is assessed based on magnetic resonance imaging (MRI) findings and clinical data before operation. These data contain irrelevant features, while uncertainties and missing values also exist. Rough set theory can deal with vagueness and uncertainty in data analysis, and can efficiently remove redundant information. In this paper, a rough set method is applied to predict the degree of malignancy. As feature selection can improve the classification accuracy effectively, rough set feature selection algorithms are employed to select features. The selected feature subsets are used to generate decision rules for the classification task. A rough set attribute reduction algorithm that employs a search method based on particle swarm optimization (PSO) is proposed in this paper and compared with other rough set reduction algorithms. Experimental results show that reducts found by the proposed algorithm are more efficient and can generate decision rules with better classification performance. The rough set rule-based method can achieve higher classification accuracy than other intelligent analysis methods such as neural networks, decision trees and a fuzzy rule extraction algorithm based on Fuzzy Min-Max Neural Networks (FRE-FMMNN). Moreover, the decision rules induced by rough set rule induction algorithm can reveal regular and interpretable patterns of the relations between glioma MRI features and the degree of malignancy, which are helpful for medical experts.  相似文献   

3.
Induction of multiple fuzzy decision trees based on rough set technique   总被引:5,自引:0,他引:5  
The integration of fuzzy sets and rough sets can lead to a hybrid soft-computing technique which has been applied successfully to many fields such as machine learning, pattern recognition and image processing. The key to this soft-computing technique is how to set up and make use of the fuzzy attribute reduct in fuzzy rough set theory. Given a fuzzy information system, we may find many fuzzy attribute reducts and each of them can have different contributions to decision-making. If only one of the fuzzy attribute reducts, which may be the most important one, is selected to induce decision rules, some useful information hidden in the other reducts for the decision-making will be losing unavoidably. To sufficiently make use of the information provided by every individual fuzzy attribute reduct in a fuzzy information system, this paper presents a novel induction of multiple fuzzy decision trees based on rough set technique. The induction consists of three stages. First several fuzzy attribute reducts are found by a similarity based approach, and then a fuzzy decision tree for each fuzzy attribute reduct is generated according to the fuzzy ID3 algorithm. The fuzzy integral is finally considered as a fusion tool to integrate the generated decision trees, which combines together all outputs of the multiple fuzzy decision trees and forms the final decision result. An illustration is given to show the proposed fusion scheme. A numerical experiment on real data indicates that the proposed multiple tree induction is superior to the single tree induction based on the individual reduct or on the entire feature set for learning problems with many attributes.  相似文献   

4.
The ways to transform a wide class of machine learning algorithms into processes of plausible reasoning based on known deductive and inductive rules of inference are shown. The employed approach to machine learning problems is based on the concept of a good classification (diagnostic) test for a given set of positive and negative examples. The problem of inferring all good diagnostic tests is to search for the best approximations of the given classification (partition or the partitioning) on the established set of examples. The theory of algebraic lattice is used as a mathematical language to construct algorithms of inferring good classification tests. The advantage of the algebraic lattice is that it is given both as a declarative structure, i.e., the structure for knowledge representation, and as a system of dual operations used to generate elements of this structure. In this work, algorithms of inferring good tests are decomposed into subproblems and operations that are the main rules of plausible human inductive and deductive reasoning. The process of plausible reasoning is considered as a sequence of three mental acts: implementing the rule of reasoning (inductive or deductive)with obtaining a new assertion, refining the boundaries of reasoning domain, and choosing a new rule of reasoning (deductive or inductive one).  相似文献   

5.
In this paper, we introduce a new adaptive rule-based classifier for multi-class classification of biological data, where several problems of classifying biological data are addressed: overfitting, noisy instances and class-imbalance data. It is well known that rules are interesting way for representing data in a human interpretable way. The proposed rule-based classifier combines the random subspace and boosting approaches with ensemble of decision trees to construct a set of classification rules without involving global optimisation. The classifier considers random subspace approach to avoid overfitting, boosting approach for classifying noisy instances and ensemble of decision trees to deal with class-imbalance problem. The classifier uses two popular classification techniques: decision tree and k-nearest-neighbor algorithms. Decision trees are used for evolving classification rules from the training data, while k-nearest-neighbor is used for analysing the misclassified instances and removing vagueness between the contradictory rules. It considers a series of k iterations to develop a set of classification rules from the training data and pays more attention to the misclassified instances in the next iteration by giving it a boosting flavour. This paper particularly focuses to come up with an optimal ensemble classifier that will help for improving the prediction accuracy of DNA variant identification and classification task. The performance of proposed classifier is tested with compared to well-approved existing machine learning and data mining algorithms on genomic data (148 Exome data sets) of Brugada syndrome and 10 real benchmark life sciences data sets from the UCI (University of California, Irvine) machine learning repository. The experimental results indicate that the proposed classifier has exemplary classification accuracy on different types of biological data. Overall, the proposed classifier offers good prediction accuracy to new DNA variants classification where noisy and misclassified variants are optimised to increase test performance.  相似文献   

6.
Machine Learning for Intelligent Processing of Printed Documents   总被引:1,自引:0,他引:1  
A paper document processing system is an information system component which transforms information on printed or handwritten documents into a computer-revisable form. In intelligent systems for paper document processing this information capture process is based on knowledge of the specific layout and logical structures of the documents. This article proposes the application of machine learning techniques to acquire the specific knowledge required by an intelligent document processing system, named WISDOM++, that manages printed documents, such as letters and journals. Knowledge is represented by means of decision trees and first-order rules automatically generated from a set of training documents. In particular, an incremental decision tree learning system is applied for the acquisition of decision trees used for the classification of segmented blocks, while a first-order learning system is applied for the induction of rules used for the layout-based classification and understanding of documents. Issues concerning the incremental induction of decision trees and the handling of both numeric and symbolic data in first-order rule learning are discussed, and the validity of the proposed solutions is empirically evaluated by processing a set of real printed documents.  相似文献   

7.
A new decision tree method for application in data mining, machine learning, pattern recognition, and other areas is proposed in this paper. The new method incorporates a classical multivariate statistical method, linear discriminant function, into decision trees' recursive partitioning process. The proposed method considers not only the linear combination with all variables, but also combinations with fewer variables. It uses a tabu search technique to find appropriate variable combinations within a reasonable length of time. For problems with more than two classes, the tabu search technique is also used to group the data into two superclasses before each split. The results of our experimental study indicate that the proposed algorithm appears to outperform some of the major classification algorithms in terms of classification accuracy, the proposed algorithm generates decision trees with relatively small sizes, and the proposed algorithm runs faster than most multivariate decision trees and its computing time increases linearly with data size, indicating that the algorithm is scalable to large datasets.  相似文献   

8.
Hybridization of fuzzy GBML approaches for pattern classification problems   总被引:4,自引:0,他引:4  
We propose a hybrid algorithm of two fuzzy genetics-based machine learning approaches (i.e., Michigan and Pittsburgh) for designing fuzzy rule-based classification systems. First, we examine the search ability of each approach to efficiently find fuzzy rule-based systems with high classification accuracy. It is clearly demonstrated that each approach has its own advantages and disadvantages. Next, we combine these two approaches into a single hybrid algorithm. Our hybrid algorithm is based on the Pittsburgh approach where a set of fuzzy rules is handled as an individual. Genetic operations for generating new fuzzy rules in the Michigan approach are utilized as a kind of heuristic mutation for partially modifying each rule set. Then, we compare our hybrid algorithm with the Michigan and Pittsburgh approaches. Experimental results show that our hybrid algorithm has higher search ability. The necessity of a heuristic specification method of antecedent fuzzy sets is also demonstrated by computational experiments on high-dimensional problems. Finally, we examine the generalization ability of fuzzy rule-based classification systems designed by our hybrid algorithm.  相似文献   

9.
Knowledge inference systems are built to identify hidden and logical patterns in huge data. Decision trees play a vital role in knowledge discovery but crisp decision tree algorithms have a problem with sharp decision boundaries which may not be implicated to all knowledge inference systems. A fuzzy decision tree algorithm overcomes this drawback. Fuzzy decision trees are implemented through fuzzification of the decision boundaries without disturbing the attribute values. Data reduction also plays a crucial role in many classification problems. In this research article, it presents an approach using principal component analysis and modified Gini index based fuzzy SLIQ decision tree algorithm. The PCA is used for dimensionality reduction, and modified Gini index fuzzy SLIQ decision tree algorithm to construct decision rules. Finally, through PID data set, the method is validated in the simulation experiment in MATLAB.  相似文献   

10.
11.
We present ELEM2, a machine learning system that induces classification rules from a set of data based on a heuristic search over a hypothesis space. ELEM2 is distinguished from other rule induction systems in three aspects. First, it uses a new heuristtic function to guide the heuristic search. The function reflects the degree of relevance of an attribute-value pair to a target concept and leads to selection of the most relevant pairs for formulating rules. Second, ELEM2 handles inconsistent training examples by defining an unlearnable region of a concept based on the probability distribution of that concept in the training data. The unlearnable region is used as a stopping criterion for the concept learning process, which resolves conflicts without removing inconsistent examples. Third, ELEM2 employs a new rule quality measure in its post-pruning process to prevent rules from overfitting the data. The rule quality formula measures the extent to which a rule can discriminate between the positive and negative examples of a class. We describe features of ELEM2, its rule induction algorithm and its classification procedure. We report experimental results that compare ELEM2 with C4.5 and CN2 on a number of datasets.  相似文献   

12.
In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.  相似文献   

13.
In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.  相似文献   

14.
One of the known classification approaches in data mining is rule induction (RI). RI algorithms such as PRISM usually produce If-Then classifiers, which have a comparable predictive performance to other traditional classification approaches such as decision trees and associative classification. Hence, these classifiers are favourable for carrying out decisions by users and therefore they can be utilised as decision making tools. Nevertheless, RI methods, including PRISM and its successors, suffer from a number of drawbacks primarily the large number of rules derived. This can be a burden especially when the input data is largely dimensional. Therefore, pruning unnecessary rules becomes essential for the success of this type of classifiers. This article proposes a new RI algorithm that reduces the search space for candidate rules by early pruning any irrelevant items during the process of building the classifier. Whenever a rule is generated, our algorithm updates the candidate items frequency to reflect the discarded data examples associated with the rules derived. This makes items frequency dynamic rather static and ensures that irrelevant rules are deleted in preliminary stages when they don't hold enough data representation. The major benefit will be a concise set of decision making rules that are easy to understand and controlled by the decision maker. The proposed algorithm has been implemented in WEKA (Waikato Environment for Knowledge Analysis) environment and hence it can now be utilised by different types of users such as managers, researchers, students and others. Experimental results using real data from the security domain as well as sixteen classification datasets from University of California Irvine (UCI) repository reveal that the proposed algorithm is competitive in regards to classification accuracy when compared to known RI algorithms. Moreover, the classifiers produced by our algorithm are smaller in size which increase their possible use in practical applications.  相似文献   

15.
孙娟  王熙照 《计算机工程》2006,32(12):210-211,231
决策树归纳学习算法是机器学习领域中解决分类问题的最有效工具之一。由于决策树算法自身的缺陷了,因此需要进行相应的简化来提高预测精度。模糊决策树算法是对决策树算法的一种改进,它更加接近人的思维方式。文章通过实验分析了模糊决策树、规则简化与模糊规则简化;模糊决策树与模糊预剪枝算法的异同,对决策树的大小、算法的训练准确率与测试准确率进行比较,分析了模糊决策树的性能,为改进该算法提供了一些有益的线索。  相似文献   

16.
17.
A hybrid coevolutionary algorithm for designing fuzzy classifiers   总被引:1,自引:0,他引:1  
Rule learning is one of the most common tasks in knowledge discovery. In this paper, we investigate the induction of fuzzy classification rules for data mining purposes, and propose a hybrid genetic algorithm for learning approximate fuzzy rules. A novel niching method is employed to promote coevolution within the population, which enables the algorithm to discover multiple rules by means of a coevolutionary scheme in a single run. In order to improve the quality of the learned rules, a local search method was devised to perform fine-tuning on the offspring generated by genetic operators in each generation. After the GA terminates, a fuzzy classifier is built by extracting a rule set from the final population. The proposed algorithm was tested on datasets from the UCI repository, and the experimental results verify its validity in learning rule sets and comparative advantage over conventional methods.  相似文献   

18.
19.
随着基于机器学习的文本自动分类方法成为主流分类技术,基于机器学习的文本分类方法往往忽视了对规则分类方法的有效运用。该文将基于规则的分类思想和基于机器学习的分类方法有机地结合起来,把规则判别看作一个分量分类器,提出了一种辅以规则补充的双层文本分类模型和一种优化的分类规则学习算法。根据该方法设计并实现了一个基于规则和N-Gram统计分类相结合的双层分类器,进行了双层分类模型与单独的N-Gram分类模型的实验,结果表明辅以规则补充的双层分类器具有更好的分类性能。  相似文献   

20.
In this paper, we propose a lazy learning strategy for building classification learning models. Instead of learning the models with the whole training data set before observing the new instance, a selection of patterns is made depending on the new query received and a classification model is learnt with those selected patterns. The selection of patterns is not homogeneous, in the sense that the number of selected patterns depends on the position of the query instance in the input space. That selection is made using a weighting function to give more importance to the training patterns that are more similar to the query instance. Our intention is to provide a lazy learning mechanism suited to any machine learning classification algorithm. For this reason, we study two different methods to avoid fixing any parameter. Experimental results show that classification rates of traditional machine learning algorithms based on trees, rules, or functions can be improved when they are learnt with the lazy learning approach proposed. © 2011 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号