首页 | 官方网站   微博 | 高级检索  
     

改进的基于层次距离的基因表达式编程特征选择分类算法
引用本文:湛航,何朗,黄樟灿,李华峰,张蔷,谈庆.改进的基于层次距离的基因表达式编程特征选择分类算法[J].计算机应用,2021,41(9):2658-2667.
作者姓名:湛航  何朗  黄樟灿  李华峰  张蔷  谈庆
作者单位:1. 武汉理工大学 理学院, 武汉 430070;2. 武汉大学 数学与统计学院, 武汉 430072
基金项目:国家自然科学基金面上项目(61672391)。
摘    要:针对一般特征选择算法未能揭示数据特征与数据类别之间的可解释性映射关系的问题,在基因表达式编程(GEP)的基础上,通过引入初始化方法、变异策略以及适应度评价方法,提出了一种改进的基于层次距离的GEP特征选择分类算法(FSLDGEP)。首先,利用定义的选择概率有导向地初始化种群个体,从而增加种群中有效个体的数量;其次,定义个体的层次邻域,使种群个体基于其层次邻域进行变异,并解决了变异过程中的盲目无导向性问题;最后,将维度缩减率与分类准确率结合起来作为个体的适应度值,从而改变种群单一优化目标的进化模式,并平衡两者之间的关系。在7个数据集上进行5折交叉和10折交叉验证,所提算法给出了数据特征及其类别之间的函数映射关系,将得到的映射函数用于数据分类。与森林优化特征选择算法(FSFOA)、邻域软边界特征选择算法(NSM)、基于邻域有效信息比的特征选择算法(FS-NEIR)等对比算法相比,所提算法的维度缩减率在Hepatitis、WPBC(Wisconsin Prognostic Breast Cancer)、Sonar、WDBC(Wisconsin Diagnostic Breast Cancer)数据集上得到了最好结果;与对比算法相比,所提算法的平均分类准确率在Hepatitis、Ionosphere、Musk1、WPBC、Heart-Statlog、WDBC数据集上得到了最好结果。实验结果验证了所提算法在特征选择分类问题上的可行性、有效性和优越性。

关 键 词:特征选择  函数发现  基因表达式编程  种群初始化  层次邻域  
收稿时间:2020-11-17
修稿时间:2021-03-09

Improved feature selection and classification algorithm for gene expression programming based on layer distance
ZHAN Hang,HE Lang,HUANG Zhangcan,LI Huafeng,ZHANG Qiang,TAN Qing.Improved feature selection and classification algorithm for gene expression programming based on layer distance[J].journal of Computer Applications,2021,41(9):2658-2667.
Authors:ZHAN Hang  HE Lang  HUANG Zhangcan  LI Huafeng  ZHANG Qiang  TAN Qing
Affiliation:1. School of Science, Wuhan University of Technology, Wuhan Hubei 430070, China;2. School of Mathematics and Statistics, Wuhan University, Wuhan Hubei 430072, China
Abstract:Concerning the problem that the interpretable mapping relationship between data features and data categories do not be revealed by general feature selection algorithms. on the basis of Gene Expression Programming (GEP),by introducing the initialization methods, mutation strategies and fitness evaluation methods,an improved Feature Selection classification algorithm based on Layer Distance for GEP(FSLDGEP) was proposed. Firstly,the selection probability was defined to initialize the individuals in the population directionally, so as to increase the number of effective individuals in the population. Secondly, the layer neighborhood of the individual was proposed, so that each individual in the population would mutate based on its layer neighborhood, and the blind and unguided problem in the process of mutation was solved。Finally, the dimension reduction rate and classification accuracy were combined as the fitness value of the individual, which changed the population evolutionary mode of single optimization goal and balanced the relationship between the above two. The 5-fold and 10-fold verifications were performed on 7 datasets, the functional mapping relationship between data features and their categories was given by the proposed algorithm, and the obtained mapping function was used for data classification. Compared with Feature Selection based on Forest Optimization Algorithm (FSFOA), feature evaluation and selection based on Neighborhood Soft Margin (NSM), Feature Selection based on Neighborhood Effective Information Ratio (FS-NEIR)and other comparison algorithms, the proposed algorithm has obtained the best results of the dimension reduction rate on Hepatitis, Wisconsin Prognostic Breast Cancer (WPBC), Sonar and Wisconsin Diagnostic Breast Cancer (WDBC) datasets, and has the best average classification accuracy on Hepatitis, Ionosphere, Musk1, WPBC, Heart-Statlog and WDBC datasets. Experimental results shows that the feasibility, effectiveness and superiority of the proposed algorithm in feature selection and classification are verified.
Keywords:feature selection  function discovery  Gene Expression Programming (GEP)  population initialization  layer neighborhood  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号