首页 | 官方网站   微博 | 高级检索  
     

谱聚类欠取样下自编码网络不平衡数据挖掘
引用本文:王舒梵,严涛,姜新盈.谱聚类欠取样下自编码网络不平衡数据挖掘[J].计算机系统应用,2021,30(10):331-335.
作者姓名:王舒梵  严涛  姜新盈
作者单位:上海工程技术大学数理与统计学院,上海201620
摘    要:不平衡数据集的应用领域日益广泛,需求也越来越高,为提升整体数据集的分类准确率,以谱聚类欠取样为前提条件,构建一种自编码网络不平衡数据挖掘方法.把聚类问题转换成无向图多路径划分问题,通过无向图与标准化处理完成谱聚类,经过有选择地欠取样处理多数类数据集,获取分类边界偏移量,利用学习过程是无监督学习的自编码网络,升、降维数据,获取各维度隐藏特征,实现各层面的数据高效表示学习,根据最大均值差异与预设阈值的对比结果,调整自编码网络,基于得到的分类界面,完成不平衡数据挖掘.选用具有不同实际应用背景的UCI数据集,从中抽取10组数据作为测试集,经谱聚类欠取样处理与模拟实验,发现所提方法大幅提升少数类分类精度与整体挖掘性能,具有较好的适用性与可行性.

关 键 词:谱聚类  欠取样  自编码网络  不平衡数据  分类边界  聚类中心
收稿时间:2020/12/24 0:00:00
修稿时间:2021/1/25 0:00:00

Unbalanced Data Mining of Self-Encoding Network under Spectral Clustering Undersampling
WANG Shu-Fan,YAN Tao,JIANG Xin-Ying.Unbalanced Data Mining of Self-Encoding Network under Spectral Clustering Undersampling[J].Computer Systems& Applications,2021,30(10):331-335.
Authors:WANG Shu-Fan  YAN Tao  JIANG Xin-Ying
Affiliation:School of Mathematics and Statistics, Shanghai University of Engineering Science, Shanghai 201620, China
Abstract:The application fields of unbalanced data sets are becoming increasingly extensive, and the demand for them is getting higher. Taking the spectral clustering undersampling as a prerequisite, this study develops an unbalanced data mining method based on a self-encoding network to improve the classification accuracy of the overall data set. The clustering problem is converted into the multi-path partition problem of an undirected graph, and the spectral clustering is completed depending on the undirected graph and standardized processing. The majority of data sets are processed through selective undersampling to yield the classification boundary offset. The learning process is a self-encoding network of unsupervised learning, based on which the dimensionality of data is increased or reduced so that hidden features of each dimension can be obtained and the efficient representation and learning of data are realized at all levels. The self-encoding network is adjusted according to the comparison between the maximum mean difference and the preset threshold. The unbalanced data mining is then completed with the obtained classification interface. UCI data sets with different practical application backgrounds are selected, from which 10 sets of data are extracted as test sets. After spectral clustering undersampling, the simulation experiments demonstrate that the proposed method greatly improves the classification accuracy of the minority and overall mining performance, which shows good applicability and feasibility.
Keywords:spectral clustering  undersampling  self-encoding network  unbalanced data  classification boundary  clustering center
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号