首页 | 官方网站   微博 | 高级检索  
     

基于梯度分布调节策略的Xgboost算法优化
引用本文:李浩,朱焱.基于梯度分布调节策略的Xgboost算法优化[J].计算机应用,2020,40(6):1633-1637.
作者姓名:李浩  朱焱
作者单位:西南交通大学 信息科学与技术学院,成都 611756
基金项目:四川省科技计划项目(2019YFSY0032)。
摘    要:为了解决集成学习模型Xgboost在二分类问题中少数类检出率低的问题,提出了基于梯度分布调节策略的改进的Xgboost算法--LCGHA-Xgboost。首先,通过定义损失贡献(LC)来模拟Xgboost算法中样本个体的损失量;而后,通过定义损失贡献密度(LCD)来衡量Xgboost算法中样本被正确分类的难易程度;最后,提出了梯度分布调节算法LCGHA,依据LCD动态调整样本个体的一阶梯度分布,间接地增大难分样本(主要存在于少数类中)的损失量,减小易分样本(主要存在于多数类中)的损失量,使Xgboost算法偏向对难分样本的学习。实验结果表明,与Xgboost、GBDT、随机森林(Random_Forest)这三大集成学习算法相比,LCGHA-Xgboost算法在多个UCI数据集上的召回率(Recall)值有5.4%~16.7%的提高,AUC值有0.94%~7.41%的提高;在垃圾网页数据集WebSpam-UK2007和DC2010数据集上所提算法的Recall值更是有44.4%~383.3%的提高,AUC值有5.8%~35.6%的提高。LCGHA-Xgboost算法可以有效提高对少数类的分类检出能力,减小少数类的分类错误率。

关 键 词:不平衡分类  Xgboost  梯度分布  损失贡献  损失贡献密度
收稿时间:2019-11-04
修稿时间:2019-12-16

Xgboost algorithm optimization based on gradient distribution harmonized strategy
LI Hao,ZHU Yan.Xgboost algorithm optimization based on gradient distribution harmonized strategy[J].journal of Computer Applications,2020,40(6):1633-1637.
Authors:LI Hao  ZHU Yan
Affiliation:School of Information Science and Technology, Southwest Jiaotong University, Chengdu Sichuan 611756, China
Abstract:In order to solve the problem of low detection rate of minority class by ensemble learning model eXtreme gradient boosting(Xgboost)in the binary classification problem,an improved Xgboost algorithm based on gradient distribution harmonized strategy called Loss Contribution Gradient Harmonized Algorithm(LCGHA)-Xgboost was proposed.Firstly,Loss Contribution(LC)was defined to simulate the losses of the samples in Xgboost algorithm.Secondly,by defining Loss Contribution Density(LCD),the difficulty of samples being correctly classified in Xgboost algorithm was measured.Finally,a gradient distribution harmonized algorithm called LCGHA was proposed to dynamically adjust the one order gradient distribution of samples according to the LCD.In the algorithm,the losses of hard samples(mainly in minority class)were indirectly increased,and the losses of easy samples(mainly in majority class)were indirectly reduced,making Xgboost algorithm tend to learn the hard samples.The experimental results show that compared with three ensemble learning algorithms Xgboost,GBDT(Gradient Boosting Decision Tree)and Random_Forest,LCGHA-Xgboost has the recall increased by 5.4%-16.7%,and Area Under the Curve(AUC)improved by 0.94%-7.41%on multiple UCI datasets,and the Recall increased by 44.4%-383.3%,and AUC improved by 5.8%-35.6%on WebSpam-UK2007 and DC2010 datasets.LCGHA-Xgboost can effectively improve the classification and detection ability for minority class,and reduce the classification error rate of minority class.
Keywords:imbalanced classification                                                                                                                        Xgboost                                                                                                                        gradient distribution                                                                                                                        loss contribution                                                                                                                        loss contribution density
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号