首页 | 官方网站   微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
在三角形隶属度函数的基础上,研究了基于模糊值的最优特征子集选取算法的不同相似性度量公式得出的类间交叠度、选取的特征子集以及该特征子集用于分类的准确性之间的关系,找到了比较适合于基于模糊值的最优特征子集选取算法的相似性度量公式.  相似文献   

2.
谢娟英  丁丽娟  王明钊 《软件学报》2020,31(4):1009-1024
基因表达数据具有高维小样本特点,包含了大量与疾病无关的基因,对该类数据进行分析的首要步骤是特征选择.常见的特征选择方法需要有类标的数据,但样本类标获取往往比较困难.针对基因表达数据的特征选择问题,提出基于谱聚类的无监督特征选择思想FSSC(feature selection by spectral clustering).FSSC对所有特征进行谱聚类,将相似性较高的特征聚成一类,定义特征的区分度与特征独立性,以二者之积度量特征重要性,从各特征簇选取代表性特征,构造特征子集.根据使用的不同谱聚类算法,得到FSSC-SD(FSSC based on standard deviation) FSSCMD(FSSC based on mean distance)和FSSC-ST(FSSC based on self-tuning)这3种无监督特征选择算法.以SVMs(support vector machines)和KNN(K-nearest neighbours)为分类器,在10个基因表达数据集上进行实验测试.结果表明,FSSC-SD、FSSC-MD和FSSC-ST算法均能选择到具有强分类能力的特征子集.  相似文献   

3.
特征选择是数据挖掘和机器学习领域中一种常用的数据预处理技术。在无监督学习环境下,定义了一种特征平均相关度的度量方法,并在此基础上提出了一种基于特征聚类的特征选择方法 FSFC。该方法利用聚类算法在不同子空间中搜索簇群,使具有较强依赖关系(存在冗余性)的特征被划分到同一个簇群中,然后从每一个簇群中挑选具有代表性的子集共同构成特征子集,最终达到去除不相关特征和冗余特征的目的。在 UCI 数据集上的实验结果表明,FSFC 方法与几种经典的有监督特征选择方法具有相当的特征约减效果和分类性能。  相似文献   

4.
在机器学习中,信息冗余和无关特征会导致较高的计算复杂度以及过拟合问题.为此,提出一种基于联盟博弈的Filter特征选择算法.采用联合互信息度量联盟与目标类的依赖程度,使用Shapley权利指数评估每个特征在整个特征空间中的重要性,选择具有最高优先权的特征子集用于分类学习.实验结果表明,在C4.5和支持向量机2种分类器下,该算法特征子集分类准确率的均值分别为88.72%、93.39%,高于mRMR算法和ReliefF算法.  相似文献   

5.
针对特征选择中存在数据缺乏类别信息的问题,提出一种新型的基于改进ReliefF的无监督特征选择方法UFS-IR.由于ReliefF类算法存在小类样本抽样概率低、无法删除冗余特征的缺陷,该方法以DBSCAN聚类算法指导分类,通过改进抽样策略,使用调整的余弦相似度度量特征间的相关性作为去冗余的凭据.实验表明UFS-IR可以有效缩减数据维度的同时保证特征子集的最大相关最小冗余性,具有很好的性能.  相似文献   

6.
已有的基于模糊粗糙集的多标记特征选择算法多从单一的样本空间刻画属性区分能力,忽视属性对标记的区分能力.基于这一认识,文中同时从样本和标记两个空间出发,提出基于双空间模糊辨识关系的多标记特征选择算法.首先,基于模糊辨识关系分别从样本和标记角度定义两种多标记属性重要性度量,然后通过权重融合的方式融合两种度量,基于融合后的度量,运用前向贪心算法构建多标记特征选择算法.在5个数据集上的对比实验验证本文算法的有效性  相似文献   

7.
粗糙集理论作为一种处理不精确和不一致数据的数学工具被广泛应用于特征子集选择和属性约简中.在大多数现存的算法中,属性依赖度被用来度量特征子集的重要性,而依赖度在处理不一致信息系统时会出现找不到任何特征子集的问题.文中讨论了使用属性依赖性作为度量的缺点和不足,引入一种一致性度量,分析了其和依赖性之间的关系,重新定义了信息系统的多余属性和约简的概念,并构造了基于一致性度量的前向贪婪搜索算法.通过UCI数据集合验证了算法能够有效地处理不一致信息系统.  相似文献   

8.
针对离散值数据集特征选择问题,提出基于相对分类信息熵的进化特征选择算法.使用遗传算法搜索最优特征子集,使用相对分类信息熵度量特征子集的重要性.以相对分类信息熵作为适应度函数,使用二进制编码问题的解,使用赌轮方法选择产生下一代个体.实验表明文中算法在测试精度上优于其它方法,此外还从理论上证明文中算法的可行性.  相似文献   

9.
高维数据中存在着大量的冗余和不相关特征,严重影响了数据挖掘的效率、质量以及机器学习算法的泛化性能,因此特征选择成为计算机科学与技术领域的重要研究方向.文中利用自编码器的非线性学习能力提出了一种无监督特征选择算法.首先,基于自编码器的重建误差选择出单个特征对数据重建贡献大的特征子集.其次,利用单层自编码器的特征权重最终选择出对其他特征重建贡献大的特征子集,通过流形正则保持原始数据空间的局部与非局部结构,并且对特征权重增加L2/1稀疏正则来提高特征权重的稀疏性,使之选择出更具区别性的特征.最后,构造一个新的目标函数,并利用梯度下降算法对所提目标函数进行优化.在6个不同类型的典型数据集上进行实验,并将所提算法与5个常用的无监督特征选择算法进行对比.实验结果验证了所提算法能够有效地选择出重要特征,显著提高了分类准确率和聚类准确率.  相似文献   

10.
马磊  罗川  李天瑞  陈红梅 《计算机应用》2023,(10):3121-3128
动态特征选择算法能够大幅提升处理动态数据的效率,然而目前基于模糊粗糙集的无监督的动态特征选择算法较少。针对上述问题,提出一种特征分批次到达情况下的基于模糊粗糙集的无监督动态特征选择(UDFRFS)算法。首先,通过定义伪三角范数和新的相似关系在已有数据的基础上进行模糊关系值的更新过程,从而减少不必要的运算过程;其次,通过利用已有的特征选择结果,在新的特征到达后,使用依赖度判断原始特征部分是否需要重新计算,以减少冗余的特征选择过程,从而进一步提高特征选择的速度。实验结果表明,UDFRFS相较于静态的基于依赖度的无监督模糊粗糙集特征选择算法,在时间效率方面能够提升90个百分点以上,同时保持较好的分类精度和聚类表现。  相似文献   

11.
This paper describes a novel feature selection algorithm for unsupervised clustering, that combines the clustering ensembles method and the population based incremental learning algorithm. The main idea of the proposed unsupervised feature selection algorithm is to search for a subset of all features such that the clustering algorithm trained on this feature subset can achieve the most similar clustering solution to the one obtained by an ensemble learning algorithm. In particular, a clustering solution is firstly achieved by a clustering ensembles method, then the population based incremental learning algorithm is adopted to find the feature subset that best fits the obtained clustering solution. One advantage of the proposed unsupervised feature selection algorithm is that it is dimensionality-unbiased. In addition, the proposed unsupervised feature selection algorithm leverages the consensus across multiple clustering solutions. Experimental results on several real data sets demonstrate that the proposed unsupervised feature selection algorithm is often able to obtain a better feature subset when compared with other existing unsupervised feature selection algorithms.  相似文献   

12.
目前已有很多针对单值信息系统的无监督特征选择方法,但针对区间值信息系统的无监督特征选择方法却很少.针对区间序信息系统,文中提出模糊优势关系,并基于此关系扩展模糊排序信息熵和模糊排序互信息,用于评价特征的重要性.再结合一种综合考虑信息量和冗余度的无监督最大信息最小冗余(UmIMR)准则,构造无监督特征选择方法.最后通过实验证明文中方法的有效性.  相似文献   

13.
A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic c-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy c-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of c-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed c-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.  相似文献   

14.
模糊粗糙集由于能够处理实数值数据,甚至是混合值数据中的不确定性受到人们的广泛关注,其最重要的应用之一是特征选择,相关的特征选择方法已有不少研究,但其快速的特征选择算法研究很少。实际中的数据一般含有噪声点或信息含量低的样例,如果对数据集先筛选出代表样例,再对筛选的样例集进行数据挖掘便会降低挖掘计算量。本文基于模糊粗糙集,先根据样例的模糊下近似值对样例进行筛选,然后利用筛选样例的模糊粗糙信息熵构造特征选择的评估度量,并给出相应的特征选择算法,从而降低了算法的计算复杂度。数值试验表明该快速算法具有有效性,并且对控制筛选样例个数的参数给出了建议。  相似文献   

15.
Reducing the dimensionality of the data has been a challenging task in data mining and machine learning applications. In these applications, the existence of irrelevant and redundant features negatively affects the efficiency and effectiveness of different learning algorithms. Feature selection is one of the dimension reduction techniques, which has been used to allow a better understanding of data and improve the performance of other learning tasks. Although the selection of relevant features has been extensively studied in supervised learning, feature selection in the absence of class labels is still a challenging task. This paper proposes a novel method for unsupervised feature selection, which efficiently selects features in a greedy manner. The paper first defines an effective criterion for unsupervised feature selection that measures the reconstruction error of the data matrix based on the selected subset of features. The paper then presents a novel algorithm for greedily minimizing the reconstruction error based on the features selected so far. The greedy algorithm is based on an efficient recursive formula for calculating the reconstruction error. Experiments on real data sets demonstrate the effectiveness of the proposed algorithm in comparison with the state-of-the-art methods for unsupervised feature selection.  相似文献   

16.
Unsupervised feature selection is an important problem, especially for high‐dimensional data. However, until now, it has been scarcely studied and the existing algorithms cannot provide satisfying performance. Thus, in this paper, we propose a new unsupervised feature selection algorithm using similarity‐based feature clustering, Feature Selection‐based Feature Clustering (FSFC). FSFC removes redundant features according to the results of feature clustering based on feature similarity. First, it clusters the features according to their similarity. A new feature clustering algorithm is proposed, which overcomes the shortcomings of K‐means. Second, it selects a representative feature from each cluster, which contains most interesting information of features in the cluster. The efficiency and effectiveness of FSFC are tested upon real‐world data sets and compared with two representative unsupervised feature selection algorithms, Feature Selection Using Similarity (FSUS) and Multi‐Cluster‐based Feature Selection (MCFS) in terms of runtime, feature compression ratio, and the clustering results of K‐means. The results show that FSFC can not only reduce the feature space in less time, but also significantly improve the clustering performance of K‐means.  相似文献   

17.
Fuzzy c-means (FCMs) is an important and popular unsupervised partitioning algorithm used in several application domains such as pattern recognition, machine learning and data mining. Although the FCM has shown good performance in detecting clusters, the membership values for each individual computed to each of the clusters cannot indicate how well the individuals are classified. In this paper, a new approach to handle the memberships based on the inherent information in each feature is presented. The algorithm produces a membership matrix for each individual, the membership values are between zero and one and measure the similarity of this individual to the center of each cluster according to each feature. These values can change at each iteration of the algorithm and they are different from one feature to another and from one cluster to another in order to increase the performance of the fuzzy c-means clustering algorithm. To obtain a fuzzy partition by class of the input data set, a way to compute the class membership values is also proposed in this work. Experiments with synthetic and real data sets show that the proposed approach produces good quality of clustering.  相似文献   

18.
The aim of this paper is to provide an efficient input feature selection algorithm for modeling of systems based on modified definition of fuzzy-rough sets. Some of the critical issues concerning the complexity and convergence of the feature selection algorithm are discussed in detail. Based on some natural properties of fuzzy t-norm and t-conorm operators, the concept of fuzzy-rough sets on compact computational domain is put forward, which is then utilized to construct improved Fuzzy-Rough Feature Selection algorithm. Various mathematical properties of this new definition of fuzzy-rough sets are discussed from pattern classification viewpoint. Speedup factor as high as 622 has been achieved with proposed algorithm compared to recently proposed FRSAR, with improved model performance on selected set of features.  相似文献   

19.
多标签数据广泛存在于现实世界中,多标签特征选择是多标签学习中重要的预处理步骤.基于模糊粗糙集模型,研究人员已经提出了一些多标签特征选择算法,但是这些算法大多没有关注标签之间的共现特性.为了解决这一问题,基于样本标签间的共现关系评价样本在标签集下的相似关系,利用这种关系定义了特征与标签之间的模糊互信息,并结合最大相关与最小冗余原则设计了一种多标签特征选择算法LC-FS.在5个公开数据集上进行了实验,实验结果表明了所提算法的有效性.  相似文献   

20.
Feature selection (FS) is one of the most important fields in pattern recognition, which aims to pick a subset of relevant and informative features from an original feature set. There are two kinds of FS algorithms depending on the presence of information about dataset class labels: supervised and unsupervised algorithms. Supervised approaches utilize class labels of dataset in the process of feature selection. On the other hand, unsupervised algorithms act in the absence of class labels, which makes their process more difficult. In this paper, we propose unsupervised probabilistic feature selection using ant colony optimization (UPFS). The algorithm looks for the optimal feature subset in an iterative process. In this algorithm, we utilize inter-feature information which shows the similarity between the features that leads the algorithm to decreased redundancy in the final set. In each step of the ACO algorithm, to select the next potential feature, we calculate the amount of redundancy between current feature and all those which have been selected thus far. In addition, we utilize a matrix to hold ant related pheromone which shows the rate of the co-presence of every pair of features in solutions. Afterwards, features are ranked based on a probability function extracted from the matrix; then, their m-top is returned as the final solution. We compare the performance of UPFS with 15 well-known supervised and unsupervised feature selection methods using different classifiers (support vector machine, naive Bayes, and k-nearest neighbor) on 10 well-known datasets. The experimental results show the efficiency of the proposed method compared to the previous related methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号