首页 | 官方网站   微博 | 高级检索  
     

类别不均衡学习中的抽样策略研究
引用本文:刘树栋,张可.类别不均衡学习中的抽样策略研究[J].计算机工程与应用,2019,55(21):1-17.
作者姓名:刘树栋  张可
作者单位:中南财经政法大学 人工智能法商应用研究中心,武汉 430073;中南财经政法大学 信息与安全工程学院,武汉 430073;中南财经政法大学 人工智能法商应用研究中心,武汉 430073;中南财经政法大学 信息与安全工程学院,武汉 430073
基金项目:国家自然科学基金;中南财经政法大学中央高校基本科研业务费专项资金
摘    要:类别不均衡学习在信用评估、客户流失预测、医学诊断、短文本情感分析、标记学习、评分预测等众多领域有广泛的应用,是机器学习研究和应用的热点方向之一,近年来逐渐引起学术界和工业界的广泛关注。目前解决类别不均衡问题主要有三种方法:数据级解决方法、算法级解决方法和集成解决方法。侧重于对近年来类别不均衡学习中的抽样策略研究进展进行综述,介绍类别不均衡学习的基本框架,对类别不均衡学习中三种主要的抽样策略(过抽样、欠抽样和混合抽样)相关研究进展进行前沿概括、比较和分析,对类别不均衡学习的抽样策略中有待研究的难点、热点及发展趋势进行展望。

关 键 词:不均衡学习  集成学习  欠抽样  特征选择  支持向量机  合成少数类过抽样技术  混合抽样

Research on Sampling Strategies in Class-Imbalanced Learning
LIU Shudong,ZHANG Ke.Research on Sampling Strategies in Class-Imbalanced Learning[J].Computer Engineering and Applications,2019,55(21):1-17.
Authors:LIU Shudong  ZHANG Ke
Affiliation:1.Centre for Artificial Intelligence and Applied Research, Zhongnan University of Economics and Law, Wuhan 430073, China 2.School of Information and Security Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China
Abstract:Class-imbalanced learning has been widely used in many application domains, such as credit scoring, customer churn prediction, medical diagnosis, short-text sentiment analysis, label learning, review prediction, which has become one of the hottest topics in domain of machine learning and its applications, and are attracting more and more attention from both industry and academia recently. A great variety of solutions have been proposed to address class imbalance problem, which can be generally divided into three groups: data-level solutions, algorithm-level solutions and ensemble solutions. This paper presents an overview of the field of sampling strategies in class-imbalanced learning, which are more important methods in data-level solutions. This paper introduces the basic issue of class-imbalanced learning, including the formal definition, performance metrics and the basic framework, reviews in detail the recent development of over-sampling, under-sampling and hybrid sampling, which are three main sampling strategies in class-imbalanced learning. The prospects for future development and suggestions for possible extensions are also discussed.
Keywords:class-imbalanced learning  ensemble learning  undersampling  feature selection  support vector machine  synthetic minority oversampling technique  hybrid sampling  
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号