首页 | 官方网站   微博 | 高级检索  
 共查询到18条相似文献,搜索用时 218 毫秒
陈松峰  范明 《计算机科学》2010,37(8):236-239256
提出了一种使用基于贝叶斯的基分类器建立组合分类器的新方法PCABoost.本方法在创建训练样本时,随机地将特征集划分成K个子集,使用PCA得到每个子集的主成分,形成新的特征空间,并将全部的训练数据映射到新的特征空间作为新的训练集.通过不同的变换生成不同的特征空间,从而产生若干个有差异的训练集.在每一个新的训练集上利用AdaBoost建立一组基于贝叶斯的逐渐提升的分类器(即一个分类器组),这样就建立了若干个有差异的分类器组,然后在每个分类器组内部通过加权投票产生一个预测,再把每个组的预测通过投票来产生组合分类器的分类结果,最终建立一个具有两层组合的组合分类器.从UCI标准数据集中随机选取30个数据集进行实验.结果表明,本算法不仅能够显著提高基于贝叶斯的分类器的分类性能,而且与Rotation Forest和AdaBoost等组合方法相比,在大部分数据集上都具有更高的分类准确率.  相似文献   

从多个弱分类器重构出强分类器的集成学习方法是机器学习领域的重要研究方向之一。尽管已有多种多样性基本分类器的生成方法被提出,但这些方法的鲁棒性仍有待提高。递减样本集成学习算法综合了目前最为流行的boosting与bagging算法的学习思想,通过不断移除训练集中置信度较高的样本,使训练集空间依次递减,使得某些被低估的样本在后续的分类器中得到充分训练。该策略形成一系列递减的训练子集,因而也生成一系列多样性的基本分类器。类似于boosting与bagging算法,递减样本集成学习方法采用投票策略对基本分类器进行整合。通过严格的十折叠交叉检验,在8个UCI数据集与7种基本分类器上的测试表明,递减样本集成学习算法总体上要优于boosting与bagging算法。  相似文献   

结合随机子空间和核极端学习机集成提出了一种新的高光谱遥感图像分类方法。首先利用随机子空间方法从高光谱遥感图像数据的整体特征中随机生成多个大小相同的特征子集;然后利用核极端学习机在这些特征子集上进行训练从而获得基分类器;最后将所有基分类器的输出集成起来,通过投票机制得到分类结果。在高光谱遥感图像数据集上的实验结果表明:所提方法能够提高分类效果,且其分类总精度要高于核极端学习机和随机森林方法。  相似文献   

组合分类器通过在输入空间中依据一定的规则生成数据集来训练成员分类器。提出一种新的基于核函数的模糊隶属度方法用来分隔数据集,并依据数据集中样本的模糊隶属度将它们分为相对难分和相对易分的数据子集,根据两个数据子集的难易程度训练不同的分类器。并用得到的两类分类器作为成员分类器生成组合分类器。将该组合分类器应用到UCI的标准数据集,实验表明该方法比Bagging和AdaBoost算法具有更好的性能。  相似文献   

基于动态加权的粗糙子空间集成   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种基于动态加权的粗糙子空间集成方法EROS-DW。利用粗糙集属性约简方法获得多个特征约简子集,并据此训练基分类器。在分类阶段,根据给定待测样本的具体特征动态地为每个基分类器指派相应的权重,采用加权投票组合规则集成各分类器的输出结果。利用UCI标准数据集对该方法的性能进行测试。实验结果表明,相较于经典的集成方法,EROS-DW方法可以获得更高的分类准确率。  相似文献   

针对高维数据实体识别问题,为了有效利用高维特征的富信息,提高分辨性能,提出一种随机组合集成分类器。定义基分类器的分类性能指标,将分类正确性和特征子集的个数作为设计基分类器两个目标,使用聚合函数将其转化为单目标优化问题。采用蚁群优化求解基分类器模型,提出利用最大信息系数度量特征的相关性作为蚁群优化启发式信息,使用谷元距离度量选择特征多样性差异最大的基分类器组合集成分类器,集成分类器的决策函数采用投票表决输出。在标准数据集上进行验证与对比,结果表明了该方法的有效性。  相似文献   

提出了一种新的基于决策树的组合分类器学习方法FL(Forest Learning)。与bagging和adaboost等传统的组合分类器学习方法不同,FL不采用抽样或加权抽样,而是直接在训练集上学习一个森林作为组合分类器。与传统组合学习方法独立地学习每个基分类器,然后把它们组合在一起的做法不同,FL学习每个基分类器时都尽可能地考虑对组合分类器的影响。首先,FL使用传统的方法构建森林的第一棵决策树;然后,逐一构建新的决策树并将其添加到森林中。在构建新的决策树时,结点的每次划分都考虑对组合分类器的影响。实验结果表明,与传统的组合分类器学习方法相比,FL在大部分数据集上都能构建出性能更好的组合分类器。  相似文献   

为提高多分类器系统的分类精度,提出了一种基于粗糙集属性约简的分类器集成方法 MCS_ARS。该方法利用粗糙集属性约简和数据子集划分方法获得若干个特征约简子集和数据子集,并据此训练基分类器;然后利用分类结果相似性得到验证集的若干个预测类别;最后利用多数投票法得到验证集的最终类别。利用UCI标准数据集对方法 MCS_ARS的性能进行测试。实验结果表明,相较于经典的集成方法,方法 MCS_ARS可以获得更高的分类准确率和稳定性。  相似文献   

不平衡数据集中的组合分类算法   总被引:1,自引:0,他引:1  
吴广潮  陈奇刚 《计算机工程与设计》2007,28(23):5687-5689,5761
为提高少数类的分类性能,对基于数据预处理的组合分类器算法进行了研究.利用Tomek links对数据集进行预处理;把新数据集里的多数类样本按照不平衡比拆分为多个子集,每个子集和少数类样本合并成新子集;用最小二乘支持向量机对每个新子集进行训练,把训练后的各个子分类器组合为一个分类系统,新的测试样本的类别将由这个分类系统投票表决.数据试验结果表明,该算法在多数类和少数类的分类性能方面,都优于最小二乘支持向量机过抽样方法和欠抽样方法.  相似文献   

针对一些多标签文本分类算法没有考虑文本-术语相关性和准确率不高的问题,提出一种结合旋转森林和AdaBoost分类器的集成多标签文本分类方法。首先,通过旋转森林算法对样本集进行分割,通过特征变换将各样本子集映射到新的特征空间,形成多个具有较大差异性的新样本子集。然后,基于AdaBoost算法,在样本子集中通过多次迭代构建多个AdaBoost基分类器。最后,通过概率平均法融合多个基分类器的决策结果,以此做出最终标签预测。在4个基准数据集上的实验结果表明,该方法在平均精确度、覆盖率、排名损失、汉明损失和1-错误率方面都具有优越的性能。  相似文献   

Rotation forest: A new classifier ensemble method   总被引:8,自引:0,他引:8  
We propose a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest.” Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Using WEKA, we examined the Rotation Forest ensemble on a random selection of 33 benchmark data sets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favorable to Rotation Forest and prompted an investigation into diversity-accuracy landscape of the ensemble models. Diversity-error diagrams revealed that Rotation Forest ensembles construct individual classifiers which are more accurate than these in AdaBoost and Random Forest, and more diverse than these in Bagging, sometimes more accurate as well.  相似文献   

The ensemble method is a powerful data mining paradigm, which builds a classification model by integrating multiple diversified component learners. Bagging is one of the most successful ensemble methods. It is made of bootstrap-inspired classifiers and uses these classifiers to get an aggregated classifier. However, in bagging, bootstrapped training sets become more and more similar as redundancy is increasing. Besides redundancy, any training set is usually subject to noise. Moreover, the training set might be imbalanced. Thus, each training instance has a different impact on the learning process. This paper explores some properties of the ensemble margin and its use in improving the performance of bagging. We introduce a new approach to measure the importance of training data in learning, based on the margin theory. Then, a new bagging method concentrating on critical instances is proposed. This method is more accurate than bagging and more robust than boosting. Compared to bagging, it reduces the bias while generally keeping the same variance. Our findings suggest that (a) examples with low margins tend to be more critical for the classifier performance; (b) examples with higher margins tend to be more redundant; (c) misclassified examples with high margins tend to be noisy examples. Our experimental results on 15 various data sets show that the generalization error of bagging can be reduced up to 2.5% and its resilience to noise strengthened by iteratively removing both typical and noisy training instances, reducing the training set size by up to 75%.  相似文献   

Hybrid models based on feature selection and machine learning techniques have significantly enhanced the accuracy of standalone models. This paper presents a feature selection‐based hybrid‐bagging algorithm (FS‐HB) for improved credit risk evaluation. The 2 feature selection methods chi‐square and principal component analysis were used for ranking and selecting the important features from the datasets. The classifiers were built on 5 training and test data partitions of the input data set. The performance of the hybrid algorithm was compared with that of the standalone classifiers: feature selection‐based classifiers and bagging. The hybrid FS‐HB algorithm performed best for qualitative dataset with less features and tree‐based unstable base classifier. Its performance on numeric data was also better than other standalone classifiers, whereas comparable to bagging with only selected features. Its performance was found better on 70:30 data partition and the type II error, which is very significant in risk evaluation was also reduced significantly. The improved performance of FS‐HB is attributed to the important features used for developing the classifier thereby reducing the complexity of the algorithm and the use of ensemble methodology, which added to the classical bias variance trade‐off and performed better than standalone classifiers.  相似文献   

Bagging, Boosting and the Random Subspace Method for Linear Classifiers   总被引:6,自引:0,他引:6  
Recently bagging, boosting and the random subspace method have become popular combining techniques for improving weak classifiers. These techniques are designed for, and usually applied to, decision trees. In this paper, in contrast to a common opinion, we demonstrate that they may also be useful in linear discriminant analysis. Simulation studies, carried out for several artificial and real data sets, show that the performance of the combining techniques is strongly affected by the small sample size properties of the base classifier: boosting is useful for large training sample sizes, while bagging and the random subspace method are useful for critical training sample sizes. Finally, a table describing the possible usefulness of the combining techniques for linear classifiers is presented. Received: 03 November 2000, Received in revised form: 02 November 2001, Accepted: 13 December 2001  相似文献   

Training set resampling based ensemble design techniques are successfully used to reduce the classification errors of the base classifiers. Boosting is one of the techniques used for this purpose where each training set is obtained by drawing samples with replacement from the available training set according to a weighted distribution which is modified for each new classifier to be included in the ensemble. The weighted resampling results in a classifier set, each being accurate in different parts of the input space mainly specified the sample weights. In this study, a dynamic integration of boosting based ensembles is proposed so as to take into account the heterogeneity of the input sets. An evidence-theoretic framework is developed for this purpose so as to take into account the weights and distances of the neighboring training samples in both training and testing boosting based ensembles. The effectiveness of the proposed technique is compared to the AdaBoost algorithm using three different base classifiers.  相似文献   

Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting methods with very simple base classifiers. One of the most simple classifiers are decision stumps, decision trees with only one decision node.

This work proposes a variant of the most well-known boosting method, AdaBoost. It is based on considering, as the base classifiers for boosting, not only the last weak classifier, but a classifier formed by the last r selected weak classifiers (r is a parameter of the method). If the weak classifiers are decision stumps, the combination of r weak classifiers is a decision tree.

The ensembles obtained with the variant are formed by the same number of decision stumps than the original AdaBoost. Hence, the original version and the variant produce classifiers with very similar sizes and computational complexities (for training and classification). The experimental study shows that the variant is clearly beneficial.  相似文献   

Ensembles of classifiers that are trained on different parts of the input space provide good results in general. As a popular boosting technique, AdaBoost is an iterative and gradient based deterministic method used for this purpose where an exponential loss function is minimized. Bagging is a random search based ensemble creation technique where the training set of each classifier is arbitrarily selected. In this paper, a genetic algorithm based ensemble creation approach is proposed where both resampled training sets and classifier prototypes evolve so as to maximize the combined accuracy. The objective function based random search procedure of the resultant system guided by both ensemble accuracy and diversity can be considered to share the basic properties of bagging and boosting. Experimental results have shown that the proposed approach provides better combined accuracies using a fewer number of classifiers than AdaBoost.  相似文献   

On the algorithmic implementation of stochastic discrimination   总被引:4,自引:0,他引:4  
Stochastic discrimination is a general methodology for constructing classifiers appropriate for pattern recognition. It is based on combining arbitrary numbers of very weak components, which are usually generated by some pseudorandom process, and it has the property that the very complex and accurate classifiers produced in this way retain the ability, characteristic of their weak component pieces, to generalize to new data. In fact, it is often observed, in practice, that classifier performance on test sets continues to rise as more weak components are added, even after performance on training sets seems to have reached a maximum. This is predicted by the underlying theory, for even though the formal error rate on the training set may have reached a minimum, more sophisticated measures intrinsic to this method indicate that classifier performance on both training and test sets continues to improve as complexity increases. We begin with a review of the method of stochastic discrimination as applied to pattern recognition. Through a progression of examples keyed to various theoretical issues, we discuss considerations involved with its algorithmic implementation. We then take such an algorithmic implementation and compare its performance, on a large set of standardized pattern recognition problems from the University of California Irvine, and Statlog collections, to many other techniques reported on in the literature, including boosting and bagging. In doing these studies, we compare our results to those reported in the literature by the various authors for the other methods, using the same data and study paradigms used by them. Included in the paper is an outline of the underlying mathematical theory of stochastic discrimination and a remark concerning boosting, which provides a theoretical justification for properties of that method observed in practice, including its ability to generalize  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号