首页 | 官方网站   微博 | 高级检索  
     

一种非平衡数据分类的过采样随机森林算法
引用本文:赵锦阳,卢会国,蒋娟萍,袁培培,柳学丽.一种非平衡数据分类的过采样随机森林算法[J].计算机应用与软件,2019(4):255-261,316.
作者姓名:赵锦阳  卢会国  蒋娟萍  袁培培  柳学丽
作者单位:1.成都信息工程大学电子工程学院;2.中国气象局大气探测重点开放实验室;3.电子科技大学航空航天学院;4.南京财经大学信息工程学院
基金项目:四川省教育厅重点科技计划资助项目(14ZA0170)
摘    要:在灾害天气、故障诊断、网络攻击和金融欺诈等领域经常存在不平衡的数据集。针对随机森林算法在非平衡数据集上表现的分类性能差的问题,提出一种新的过采样方法:SCSMOTE(Seed Center Synthetic Minority Over-sampling Technique)算法。该算法的关键是在数据集的少数类样本中找出合适的候选样本,计算出候选样本的中心,在候选样本与样本中心之间产生新的少数类样本,实现了对合成少数类样本质量的控制。结合SCSMOTE算法与随机森林算法来处理非平衡数据集,通过在UCI数据集上对比实验结果表明,该算法有效提高了随机森林在非平衡数据集上的分类性能。

关 键 词:非平衡数据集  少数类  合成样本  分类

AN OVERSAMPLING RANDOM FOREST ALGORITHM FOR CLASSIFICATION OF IMBALANCE DATA
Zhao Jinyang,Lu Huiguo,Jiang Juanping,Yuan Peipei,Liu Xueli.AN OVERSAMPLING RANDOM FOREST ALGORITHM FOR CLASSIFICATION OF IMBALANCE DATA[J].Computer Applications and Software,2019(4):255-261,316.
Authors:Zhao Jinyang  Lu Huiguo  Jiang Juanping  Yuan Peipei  Liu Xueli
Affiliation:(College of Electronic Engineering, Chengdu University of Information Technology , Chengdu 610225, Sichuan, China;Key Laboratory of Atmospheric Sounding of CMA , Chengdu 610225, Sichuan, China;School of Astronautics and Aeronautic, University of Electronic Science and Technology of China, Chengdu 611731, Sichuan , China;College of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210000, Jiangsu, China)
Abstract:There are often imbalanced datasets in disaster weather, fault diagnosis, network attacks and financial fraud. In view of the poor classification performance of random forest algorithm on imbalanced datasets, this paper proposed a new oversampling method: SCSMOTE(Seed Center Synthetic Minority Over-sampling Technique). The key of the algorithm is to find appropriate candidate samples from the minority samples of the dataset. Then we calculated the center of the candidate samples, produced new minority samples between the candidate samples and the sample center, and realized the control of the quality of synthesis the minority class samples. SCSMOTE algorithm and random forest algorithm were combined to deal with imbalance datasets. The experimental results on UCI data sets show that the algorithm effectively improves the classification performance of random forest on imbalanced datasets.
Keywords:Imbalance dataset  Minority class  Synthetic sample  Classification
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号