首页 | 官方网站   微博 | 高级检索  
     

基于卷积神经网络的结构化非平衡数据分类算法
引用本文:徐红,矫桂娥,张文俊,陈一民.基于卷积神经网络的结构化非平衡数据分类算法[J].计算机工程,2023,49(2):81-89.
作者姓名:徐红  矫桂娥  张文俊  陈一民
作者单位:1. 上海海洋大学 信息学院, 上海 201306;2. 上海大学 上海电影学院, 上海 200072;3. 上海建桥学院 信息技术学院, 上海 201306
基金项目:国家自然科学基金(61572434);上海市科技创新行动计划项目(19511104502,16511101200);上海科学技术委员会基金(19DZ22048)。
摘    要:卷积神经网络具有高效的特征提取能力和较少的参数量,被广泛应用于图像处理、目标跟踪、自然语言等领域。针对传统分类模型对于结构化非平衡数据分类效果较差的问题,提出一种基于卷积神经网络的二分类结构化非平衡数据分类算法。设计结构化数据处理算法Data-Shuffle,将原始非平衡一维结构化数据转换为三维数组形式的多通道非平衡数据,为卷积神经网络提供更多的特征值,通过改进的VGG网络构建适合非平衡数据的网络结构卷积组,以提取不同的特征。在此基础上,提出更新权重加权采样算法UWSCNN,在每个迭代次数之后,根据模型的训练结果对易错样本进行重新加权,以优化训练结果。在adult、shoppers和diabetes数据集上的实验结果表明,相比逻辑回归、随机森林等传统机器学习模型,所提的Data-Shuffle算法的F1值提升了1%~19%,G-mean提升了2%~24%,相比SMOTECNN、BSMOTECNN、SMOTECNN+CS等采样算法,所提的UWSCNN算法对非平衡数据的分类效果提升了1%~13%,有效提高模型对非平衡数据的分类性能。

关 键 词:非平衡数据  结构化数据  VGG网络  深度学习  卷积神经网络  
收稿时间:2022-01-30
修稿时间:2022-03-10

Classification Algorithm for Structured Imbalanced Data Based on Convolutional Neural Network
XU Hong,JIAO Guie,ZHANG Wenjun,CHEN Yimin.Classification Algorithm for Structured Imbalanced Data Based on Convolutional Neural Network[J].Computer Engineering,2023,49(2):81-89.
Authors:XU Hong  JIAO Guie  ZHANG Wenjun  CHEN Yimin
Affiliation:1. College of Information Technology, Shanghai Ocean University, Shanghai 201306, China;2. Shanghai Film Academy, Shanghai University, Shanghai 200072, China;3. College of Information Technology, Shanghai Jian Qiao University, Shanghai 201306, China
Abstract:Convolutional Neural Network(CNN) are widely used in image processing, object tracking, natural language, and other fields because of their efficient feature extraction capabilities and their use of fewer parameters.To address the problem in which traditional classification models have poor classification effects on structured imbalanced data, this study proposes a two-tier structured imbalanced data classification algorithm based on CNN.The study designs a structured data-processing algorithm called Data-Shuffle and converts the original imbalanced one-dimensional structured data into multi-channel imbalanced data in the form of a three-dimensional array.The study also introduces a greater number of possible eigenvalues for the CNN and builds a network structure convolution group suitable for imbalanced data through an improved VGG network to extract different features.Accordingly, an updated weighted sampling algorithm UWSCNN is then proposed.With each iteration of the algorithm, error-prone samples are reweighted based on the training results of the model to obtain optimized results.Experimental results on datasets of adult, shoppers, and diabetes show that, compared with traditional machine learning models such as logical regression and random forest, the F1 and G-mean values of the proposed Data-Shuffle algorithm are increased by 1%-19% and 2%-24%, respectively.Compared with sampled algorithms such as SMOTECNN, BSMOTECNN, and SMOTECNN+CS, the classification effect of the proposed UWSCNN algorithm on imbalanced data is improved by 1%-13%, effectively improving the classification performance of the model on imbalanced data.
Keywords:imbalanced data  structured data  VGG network  deep learning  Convolutional Neural Network(CNN)  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号