首页 | 官方网站   微博 | 高级检索  
     

一种基于GMM-EM的非平衡数据的概率增强算法
引用本文:陈刚,吴振家.一种基于GMM-EM的非平衡数据的概率增强算法[J].控制与决策,2020,35(3):763-768.
作者姓名:陈刚  吴振家
作者单位:大连海事大学理学院,辽宁大连116026;大连海事大学理学院,辽宁大连116026
基金项目:国家自然科学基金项目(11571056).
摘    要:非平衡数据的分类问题是机器学习领域的一个重要研究课题.在一个非平衡数据里,少数类的训练样本明显少于多数类,导致分类结果往往偏向多数类.针对非平衡数据分类问题,提出一种基于高斯混合模型-均值最大化方法(GMM-EM)的概率增强算法.首先,通过高斯混合模型(GMM)与均值最大化算法(EM)建立少数类数据的概率密度函数;其次,根据高概率密度的样本生成新样本的能力比低概率密度的样本更强的性质,建立一种基于少数类样本密度函数的过采样算法,该算法保证少数类数据集在平衡前后的概率分布的一致性,从数据集的统计性质使少数类达到平衡;最后,使用决策树分类器对已经达到平衡的数据集进行分类,并且利用评价指标对分类效果进行评判.通过从UCI和KEEL数据库选出的8组数据集的分类实验,表明了所提出算法比现有算法更有效.

关 键 词:分类  非平衡数据  概率密度函数  GMM-EM  概率增强

An enhancing probability algorithm for imbalanced datasets based on GMM-EM
CHEN Gang and WU Zhen-jia.An enhancing probability algorithm for imbalanced datasets based on GMM-EM[J].Control and Decision,2020,35(3):763-768.
Authors:CHEN Gang and WU Zhen-jia
Affiliation:School of Science,Dalian Maritime University,Dalian116026,China and School of Science,Dalian Maritime University,Dalian116026,China
Abstract:The classification of imbalanced datasets has been recognized as a vital issue in the field of machine learning. In an imbalanced dataset, there are obviously fewer training examples of the minority class compared to the majority class so that the result of classification may be biased towards the latter. As a result, the classification performance of whole dataset has a tendency to be poor. Facing on the problem, an enhanced probability algorithm based on the Gaussian mixture model-expectation maximization(GMM-EM) method is proposed for imbalanced datasets. Firstly, the probability density functions(PDFS) of the minority class are obtained by using GMM and EM algorithms. Secondly, because original samples with high probability density have more powerful ability to generate new instances than low probability density samples according to the basic rule of probability theory, an enhanced probability algorithm is given based on PDF of the minority class. The algorithm ensures that the PDFs of the new balanced minority class are in accordance with the original minority class, and makes the minority class balanced in the sense of statistics. Finally, the proposed algorithm and other methods are applied together with a decision tree classifier for assessment. By choosing eight imbalanced datasets from UCI and KEEL repositories, experimental results show that the proposed algorithm is more effective than other methods.
Keywords:
本文献已被 万方数据 等数据库收录!
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号