首页 | 官方网站   微博 | 高级检索  
     

基于边界混合重采样的非平衡数据分类方法
引用本文:侯贝贝,刘三阳,普事业.基于边界混合重采样的非平衡数据分类方法[J].计算机工程与应用,2020,56(1):46-52.
作者姓名:侯贝贝  刘三阳  普事业
作者单位:西安电子科技大学 数学与统计学院,西安 710126
基金项目:陕西省自然科学基金;国家自然科学基金
摘    要:在非平衡数据分类问题中,为了合成有价值的新样本和删除无影响的原样本,提出一种基于边界混合重采样的非平衡数据分类算法。该算法首先引入支持k-离群度概念,找出数据集中的边界点集和非边界点集;利用改进的SMOTE算法将少数类中的边界点作为目标样本合成新的点集,同时对多数类中的非边界点采用基于距离的欠采样算法,以此达到类之间的平衡。通过实验结果对比表明了该算法在保证G-mean值较优的前提下,一定程度上提高了少数类的分类精度。

关 键 词:支持k-离群度  重采样  边界点  非平衡数据分类  

Imbalanced Data Classification Method Based on Boundary Mixed Resampling
HOU Beibei,LIU Sanyang,PU Shiye.Imbalanced Data Classification Method Based on Boundary Mixed Resampling[J].Computer Engineering and Applications,2020,56(1):46-52.
Authors:HOU Beibei  LIU Sanyang  PU Shiye
Affiliation:School of Mathematics and Statistics, Xidian University, Xi’an 710126, China
Abstract:In the problem of imbalanced data classification, aiming to synthesize valuable new samples and delete the original samples without any influence, a novel imbalanced data classification method based on boundary mixed resampling is proposed. Firstly, the concept of k-outlier is introduced to find out the boundary and non-boundary samples and then deal with them in different ways. The minority samples in boundary are taken as the target points to synthesize new sample points while the non-boundary majority ones are under sampled based on distance to achieve a basic balance of samples. By comparing the experimental results, it shows that the proposed algorithm achieves a better classification performance on the classification accuracy of minority samples to some extent on the premise of ensuring a better G-mean value.
Keywords:k-outlier  resampling  boundary points  imbalanced data classification  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号