首页 | 官方网站   微博 | 高级检索  
     

一种基于各向异性高斯核核惩罚的PCA特征提取算法
引用本文:刘俊,李威,陈蜀宇,徐光侠.一种基于各向异性高斯核核惩罚的PCA特征提取算法[J].软件学报,2022,33(12):4574-4589.
作者姓名:刘俊  李威  陈蜀宇  徐光侠
作者单位:重庆邮电大学 软件工程学院, 重庆 400065;重庆大学 大数据与软件学院, 重庆 401331
基金项目:国家自然科学基金(61772099,61772098);重庆市自然科学基金(cstc2021jcyj-msxmX0530);重庆市“三百”科技创新领军人才支持计划(CSTCCXLJRC201917);重庆市创新创业示范团队培育计划(CSTC2017kjrc-cxcytd0063)
摘    要:提出了一种基于各向异性高斯核核惩罚的主成分分析的特征提取算法.该算法不同于传统的核主成分分析算法.在非线性数据降维中,传统的核主成分分析算法忽略了原始数据的无量纲化.此外,传统的核函数在各维度上主要由一个相同的核宽参数控制,该方法无法准确反映各维度不同特征的重要性,从而导致降维过程中准确率低下.为了解决上述问题,首先针对现原始数据的无量纲化问题,提出了一种均值化算法,使得原始数据的总方差贡献率有明显的提高.其次,引入了各向异性高斯核函数,该核函数每个维度拥有不同的核宽参数,各核宽参数能够准确地反映所在维度数据特征的重要性.再次,基于各向异性高斯核函数建立了核主成分分析的特征惩罚目标函数,以便用较少的特征表示原始数据,并反映每个主成分信息的重要性.最后,为了寻求最佳特征,引入梯度下降算法来更新特征惩罚目标函数中的核宽度和控制特征提取算法的迭代过程.为了验证所提出算法的有效性,各算法在UCI公开数据集上和KDDCUP99数据集上进行了比较.实验结果表明,所提基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法在9种公开的UCI公开数据集上准确率平均提高了4.49%.在KDDCUP99数据集上,所提基于各向异性高斯核核惩罚的主成分分析的特征提取算法比传统的主成分分析算法准确率提高了8%.

关 键 词:各向异性高斯核  特征惩罚函数  主成分分析  梯度下降法
收稿时间:2021/4/9 0:00:00
修稿时间:2021/9/12 0:00:00

PCA Feature Extraction Algorithm Based on Anisotropic Gaussian Kernel Penalty
LIU Jun,LI Wei,CHEN Shu-Yu,XU Guang-Xia.PCA Feature Extraction Algorithm Based on Anisotropic Gaussian Kernel Penalty[J].Journal of Software,2022,33(12):4574-4589.
Authors:LIU Jun  LI Wei  CHEN Shu-Yu  XU Guang-Xia
Affiliation:School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China;School of Big Data & Software Engineering, Chongqing University, Chongqing 401331, China
Abstract:This study proposes a feature extraction algorithm based on the principal component analysis (PCA) of the anisotropic Gaussian kernel penalty which is different from the traditional kernel PCA algorithms. In the non-linear data dimensionality reduction, the nondimensionalization of raw data is ignored by the traditional kernel PCA algorithms. Meanwhile, the previous kernel function is mainly controlled by one identical kernel width parameter in each dimension, which cannot reflect the significance of different features in each dimension precisely, resulting in the low accuracy of dimensionality reduction process. To address the above issues, contraposing the current problem of nondimensionalization of raw data, an averaging algorithm is proposed in this study, which has shown sound performance in improving the variance contribution rate of the original data typically. Then, anisotropic Gaussian kernel function is introduced owing each dimension has different kernel width parameters which can critically reflect the importance of the dimension data features. In addition, the feature penalty function of kernel PCA is formulated based on the anisotropic Gaussian kernel function to represent the raw data with fewer features and reflect the importance of each principal component information. Furthermore, the gradient descent method is introduced to update the kernel width of feature penalty function and control the iterative process of the feature extraction algorithm. To verify the effectiveness of the proposed algorithm, several algorithms are compared on UCI public data sets and KDDCUP99 data sets, respectively. The experimental results show that the feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 4.49% higher on average than the previous PCA algorithms on UCI public data sets. The feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 8% higher on average than the previous PCA algorithms on KDDCUP99 data sets.
Keywords:anisotropic Gaussian kernel  feature penalty function  principal component analysis (PCA)  gradient descent algorithm
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号