首页 | 官方网站   微博 | 高级检索  
     

基于正则互表示的无监督特征选择方法
引用本文:汪志远,降爱莲,奥斯曼·穆罕默德. 基于正则互表示的无监督特征选择方法[J]. 计算机应用, 2020, 40(7): 1896-1900. DOI: 10.11772/j.issn.1001-9081.2019122075
作者姓名:汪志远  降爱莲  奥斯曼·穆罕默德
作者单位:太原理工大学 信息与计算机学院, 山西 晋中 030600
基金项目:山西省回国留学人员科研资助项目(2017-051)。
摘    要:针对高维数据含有的冗余特征影响机器学习训练效率和泛化能力的问题,为提升模式识别准确率、降低计算复杂度,提出了一种基于正则互表示(RMR)性质的无监督特征选择方法。首先,利用特征之间的相关性,建立由Frobenius范数约束的无监督特征选择数学模型;然后,设计分治-岭回归优化算法对模型进行快速优化;最后,根据模型最优解综合评估每个特征的重要性,选出原始数据中具有代表性的特征子集。在聚类准确率指标上,RMR方法与Laplacian方法相比提升了7个百分点,与非负判别特征选择(NDFS)方法相比提升了7个百分点,与正则自表示(RSR)方法相比提升了6个百分点,与自表示特征选择(SR_FS)方法相比提升了3个百分点;在数据冗余率指标上,RMR方法与Laplacian方法相比降低了10个百分点,与NDFS方法相比降低了7个百分点,与RSR方法相比降低了3个百分点,与SR_FS方法相比降低了2个百分点。实验结果表明,RMR方法能够有效地选出重要特征,降低数据冗余率,提升样本聚类准确率。

关 键 词:特征选择  无监督学习  分治算法  岭回归  正则化  
收稿时间:2019-12-09
修稿时间:2020-02-24

Unsupervised feature selection method based on regularized mutual representation
WANG Zhiyuan,JIANG Ailian,MUHAMMAD Osman. Unsupervised feature selection method based on regularized mutual representation[J]. Journal of Computer Applications, 2020, 40(7): 1896-1900. DOI: 10.11772/j.issn.1001-9081.2019122075
Authors:WANG Zhiyuan  JIANG Ailian  MUHAMMAD Osman
Affiliation:College of Information and Computer, Taiyuan University of Technology, Jinzhong Shanxi 030600, China
Abstract:The redundant features of high-dimensional data affect the training efficiency and generalization ability of machine learning. In order to improve the accuracy of pattern recognition and reduce the computational complexity, an unsupervised feature selection method based on Regularized Mutual Representation (RMR) property was proposed. Firstly, the correlations between features were utilized to establish a mathematical model for unsupervised feature selection constrained by Frobenius norm. Then, a divide-and-conquer ridge regression optimization algorithm was designed to quickly optimize the model. Finally, the importances of the features were jointly evaluated according to the optimal solution to the model, and a representative feature subset was selected from the original data. On the clustering accuracy, RMR method is improved by 7 percentage points compared with the Laplacian method, improved by 7 percentage points compared with the Nonnegative Discriminative Feature Selection (NDFS) method, improved by 6 percentage points compared with the Regularized Self-Representation (RSR) method, and improved by 3 percentage points compared with the Self-Representation Feature Selection (SR_FS) method. On the redundancy rate, RMR method is reduced by 10 percentage points compared with the Laplacian method, reduced by 7 percentage points compared with the NDFS method, reduced by 3 percentage points compared with the RSR method, and reduced by 2 percentage points compared with the SR_FS method. The experimental results show that RMR method can effectively select important features, reduce redundancy rate of data and improve clustering accuracy of samples.
Keywords:feature selection   unsupervised learning   divide-and-conquer algorithm   ridge regression   regularization
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号