首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Li ZHANG  Cong WANG 《通信学报》2018,39(5):111-122
Feature selection has played an important role in machine learning and artificial intelligence in the past decades.Many existing feature selection algorithm have chosen some redundant and irrelevant features,which is leading to overestimation of some features.Moreover,more features will significantly slow down the speed of machine learning and lead to classification over-fitting.Therefore,a new nonlinear feature selection algorithm based on forward search was proposed.The algorithm used the theory of mutual information and mutual information to find the optimal subset associated with multi-task labels and reduced the computational complexity.Compared with the experimental results of nine datasets and four different classifiers in UCI,the proposed algorithm is superior to the feature set selected by the original feature set and other feature selection algorithms.  相似文献   

2.
天地一体化网络处在开放的电磁环境中,会时常遭受恶意网络入侵。为解决网络中绕过安全机制的非授权行为对系统进行攻击的问题,提出一种改进的遗传算法。该算法以决策树算法为适应度函数,通过删除数据集中的冗余特征,显著提高了对网络攻击的拦截率。通过机器学习进行异常分类,并利用遗传算法的特征选择功能,增强机器学习方法的分类效率。为验证算法的有效性,选用UNSW_NB15和UGRansome1819数据集进行训练和检测。使用随机森林、人工神经网络、K近邻和支持向量机等4种机器学习分类器进行评估,采用准确性、F1分数、召回率和混淆矩阵等指标评估算法的性能。实验证明,遗传算法作为特征选择工具能够显著提高分类准确性,并在算法性能上取得显著改善。同时,为解决弱分类器的不稳定性,提出一种集成学习优化技术,将弱分类器和强分类器集成进行优化。实验证实了该优化算法在提高弱分类器稳定性方面性能卓越。  相似文献   

3.
Evolutionary Rough Feature Selection in Gene Expression Data   总被引:1,自引:0,他引:1  
An evolutionary rough feature selection algorithm is used for classifying microarray gene expression patterns. Since the data typically consist of a large number of redundant features, an initial redundancy reduction of the attributes is done to enable faster convergence. Rough set theory is employed to generate reducts, which represent the minimal sets of nonredundant features capable of discerning between all objects, in a multiobjective framework. The effectiveness of the algorithm is demonstrated on three cancer datasets.  相似文献   

4.
互信息是一种常用的特征选择评价函数,但研究表明它会导致分类精度相对较低.文中针对互信息倾向选择低频词的不足,提出了一种新的特征评价函数TFMIIE,将信息熵和改进互信息相结合,其中改进互信息能够避免偏向低频的生僻词,而特征熵有利于去除类别不确定的特征词.实验结果表明,采用TFMIIE进行特征选择,用得到的特征子集表示文本和构建分类器,文本分类的准确率与召回率比采用互信息的方法提高了约40%,验证了所提出的基于改进互信息和信息熵的文本特征选择方法是有效的.  相似文献   

5.
张俐  陈小波 《电子与信息学报》2021,43(10):3028-3034
特征选择是机器学习、自然语言处理和数据挖掘等领域中数据预处理阶段必不可少的步骤。在一些基于信息论的特征选择算法中,存在着选择不同参数就是选择不同特征选择算法的问题。如何确定动态的非先验权重并规避预设先验参数就成为一个急需解决的问题。该文提出动态加权的最大相关性和最大独立性(WMRI)的特征选择算法。首先该算法分别计算新分类信息和保留类别信息的平均值。其次,利用标准差动态调整这两种分类信息的参数权重。最后,WMRI与其他5个特征选择算法在3个分类器上,使用10个不同数据集,进行分类准确率指标(fmi)验证。实验结果表明,WMRI方法能够改善特征子集的质量并提高分类精度。  相似文献   

6.
Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information‐based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best‐classification‐accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.  相似文献   

7.
Feature selection is one of the important topics in text classification. However, most of existing feature selection methods are serial and inefficient to be applied to massive text data sets. In this ease, a feature selection method based on parallel collaborative evolutionary genetic algorithm is presented. The presented method uses genetic algorithm to select feature subsets and takes advantage of parallel collaborative evolution to enhance time efficiency, so it can quickly acquire the feature subsets which are more representative. The experimental results show that, for accuracy ratio and recall ratio, the presented method is better than information gain, x2 statistics, and mutual information methods; the consumed time of the presented method with only one CPU is inferior to that of these three methods, but the presented method is superior after using the parallel strategy.  相似文献   

8.
基于聚类分析的软件胎记特征选择   总被引:1,自引:0,他引:1       下载免费PDF全文
罗养霞  房鼎益 《电子学报》2013,41(12):2334-2338
软件胎记选择关系着软件的识别率.本文应用约束聚类分析软件特征,基于互信息度量特征的类内和类间距离,以同类和异类软件特征构建信息增益函数和惩罚函数,选择出具有高的类区分信息和最小冗余的软件胎记特征.通过分析和比较表明该算法为软件胎记特征的选择和优化提供了一种有效途径.  相似文献   

9.
为了提高航空结构的损伤识别精度和速度,提出了一种基于互信息分类器选择的多主体决策融合方法.首先获取样本,对事先选定的模式分类器进行训练测试,得到各自的混淆矩阵;然后利用基于互信息的分类器相关度指标进行最优分类器选择,得到最优分类器组合;最后利用多主体决策融合方法,对系统进行最终识别.在航空铝制加筋板上的实验表明,本文方...  相似文献   

10.
针对现有的基于特征融合的JPEG隐写分析方法特征冗余度高、通用性较低的问题,提出了一种基于改进的增强特征选择(BFS,boosting feature selection)算法的通用JPEG隐写分析方法。从线性相关度和非线性相关度两方面降低特征冗余,将特征自相关系数和互信息这两种统计性能引入到特征的评价准则中,重新设计了特征权重计算方法,改进了BFS算法的特征评价函数。通过改进的BFS特征选择算法将3组互补性较强且准确率高的特征进行融合降维,得到最优特征子集训练分类器。对3种高隐蔽性隐写算法F5、Outguess和MME3,在不同嵌入率下进行了大量实验。结果表明,本文方法的分析准确率高于现有的检测率较高的JPEG隐写分析方法和典型的融合分析方法,融合后的特征相关性明显下降,并且具有更强的通用性。  相似文献   

11.
This paper investigates variable selection (VS) and classification for biomedical datasets with a small sample size and a very high input dimension. The sequential sparse Bayesian learning methods with linear bases are used as the basic VS algorithm. Selected variables are fed to the kernel-based probabilistic classifiers: Bayesian least squares support vector machines (BayLS-SVMs) and relevance vector machines (RVMs). We employ the bagging techniques for both VS and model building in order to improve the reliability of the selected variables and the predictive performance. This modeling strategy is applied to real-life medical classification problems, including two binary cancer diagnosis problems based on microarray data and a brain tumor multiclass classification problem using spectra acquired via magnetic resonance spectroscopy. The work is experimentally compared to other VS methods. It is shown that the use of bagging can improve the reliability and stability of both VS and model prediction.  相似文献   

12.
13.
本文提出了一种基于灰局势决策的决策层融合目标识别算法,并利用各子源传感器判决结果包含的动态信息,通过计算灰关联系数对各类目标进行了加权处理。实验中利用上述方法对五类目标雷达观测数据的分类结果进行了融合,其结果表明该方法的目标识别性能与子源传感器相比得到了有效地提高。  相似文献   

14.
针对生物组学数据高维小样本的特点而引起的分类误差较大的问题,提出了一种带约束小生境二进制粒子群优化的集成特征选择方法。该方法利用二进制粒子群优化算法搜索分类准确率最高的特征子集,通过约束粒子编码的置位个数以限制选择特征个数,并加入多模优化中的小生境技术使算法能够一次获得多个差异度较大的特征子集,最后采用集成学习技术将基于多特征子集建立的基分类器集成为强分类器并对数据进行分类学习。实验结果表明,该特征选择方法在生物组学数据上能够稳定选择较少特征并获得较好分类性能。   相似文献   

15.
Feature selection algorithm based on XGBoost   总被引:2,自引:0,他引:2  
Feature selection in classification has always been an important but difficult problem.This kind of problem requires that feature selection algorithms can not only help classifiers to improve the classification accuracy,but also reduce the redundant features as much as possible.Therefore,in order to solve feature selection in the classification problems better,a new wrapped feature selection algorithm XGBSFS was proposed.The thought process of building trees in XGBoost was used for reference,and the importance of features from three importance metrics was measured to avoid the limitation of single importance metric.Then the improved sequential floating forward selection (ISFFS) was applied to search the feature subset so that it had high quality.Compared with the experimental results of eight datasets in UCI,the proposed algorithm has good performance.  相似文献   

16.
中文文本分类中的特征选择算法研究   总被引:34,自引:0,他引:34  
比较了文档频率、信息增益、互信息、X^2统计量、期望交叉熵、文本证据权以及几率比等7种常用于文本分类的特征选择算法。实验采用国家“八六三计划”中文文文本语料库和Rocchio分类器对以上的特征选择算法分别进行评估,测评结果表明,几率比法的性能优于其它特征选择算法。  相似文献   

17.
In this paper we propose a strategy to create ensemble of classifiers based on unsupervised features selection. It takes into account a hierarchical multi-objective genetic algorithm that generates a set of classifiers by performing feature selection and then combines them to provide a set of powerful ensembles. The proposed method is evaluated in the context of handwritten month word recognition, using three different feature sets and Hidden Markov Models as classifiers. Comprehensive experiments demonstrate the effectiveness of the proposed strategy.  相似文献   

18.
Multichannel EEG is generally used in brain-computer interfaces (BCIs), whereby performing EEG channel selection 1) improves BCI performance by removing irrelevant or noisy channels and 2) enhances user convenience from the use of lesser channels. This paper proposes a novel sparse common spatial pattern (SCSP) algorithm for EEG channel selection. The proposed SCSP algorithm is formulated as an optimization problem to select the least number of channels within a constraint of classification accuracy. As such, the proposed approach can be customized to yield the best classification accuracy by removing the noisy and irrelevant channels, or retain the least number of channels without compromising the classification accuracy obtained by using all the channels. The proposed SCSP algorithm is evaluated using two motor imagery datasets, one with a moderate number of channels and another with a large number of channels. In both datasets, the proposed SCSP channel selection significantly reduced the number of channels, and outperformed existing channel selection methods based on Fisher criterion, mutual information, support vector machine, common spatial pattern, and regularized common spatial pattern in classification accuracy. The proposed SCSP algorithm also yielded an average improvement of 10% in classification accuracy compared to the use of three channels (C3, C4, and Cz).  相似文献   

19.
特征选择是目标分类的一项重要步骤,直接影响到分类器的设计和性能。本文利用实际水声目标辐射噪声数据,对遗传算法和互信息算法两种特征选择方法分别作了分析。在特征维数较大的情况下,两种方法都需要很长的计算时间,为此,提出一种遗传与互信息混合算法,旨在降低计算时间。最后,分类器用三种选择后的特征子集作为输入进行分类,并与任意选择的特征子集作为输入的分类结果作了比较。  相似文献   

20.
This paper develops a new method to generate ensembles of classifiers that uses all available data to construct every individual classifier. The base algorithm builds a decision tree in an iterative manner: The training data are divided into two subsets. In each iteration, one subset is used to grow the decision tree, starting from the decision tree produced by the previous iteration. This fully grown tree is then pruned by using the other subset. The roles of the data subsets are interchanged in every iteration. This process converges to a final tree that is stable with respect to the combined growing and pruning steps. To generate a variety of classifiers for the ensemble, we randomly create the subsets needed by the iterative tree construction algorithm. The method exhibits good performance in several standard datasets at low computational cost.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号