首页 | 官方网站   微博 | 高级检索  
     

复杂高维数据的密度峰值快速搜索聚类算法
引用本文:陈俊芬,张明,赵佳成.复杂高维数据的密度峰值快速搜索聚类算法[J].计算机科学,2020,47(3):79-86.
作者姓名:陈俊芬  张明  赵佳成
作者单位:河北大学数学与信息科学学院河北省机器学习与计算智能重点实验室 河北 保定 071002
基金项目:河北省自然科学基金;高层次创新人才科研启动经费项目
摘    要:机器学习的无监督聚类算法已被广泛应用于各种目标识别任务。基于密度峰值的快速搜索聚类算法(DPC)能快速有效地确定聚类中心点和类个数,但在处理复杂分布形状的数据和高维图像数据时仍存在聚类中心点不容易确定、类数偏少等问题。为了提高其处理复杂高维数据的鲁棒性,文中提出了一种基于学习特征表示的密度峰值快速搜索聚类算法(AE-MDPC)。该算法采用无监督的自动编码器(AutoEncoder)学出数据的最优特征表示,结合能刻画数据全局一致性的流形相似性,提高了同类数据间的紧致性和不同类数据间的分离性,促使潜在类中心点的密度值成为局部最大。在4个人工数据集和4个真实图像数据集上将AE-MDPC与经典的K-means,DBSCAN,DPC算法以及结合了PCA的DPC算法进行比较。实验结果表明,在外部评价指标聚类精度、内部评价指标调整互信息和调整兰德指数上,AE-MDPC的聚类性能优于对比算法,而且提供了更好的可视化性能。总之,基于特征表示学习且结合流形距离的AE-MDPC算法能有效地处理复杂流形数据和高维图像数据。

关 键 词:聚类  密度峰值  DPC算法  特征表示  流形距离

Clustering Algorithm by Fast Search and Find of Density Peaks for Complex High-dimensional
CHEN Jun-fen,ZHANG Ming,ZHAO Jia-cheng.Clustering Algorithm by Fast Search and Find of Density Peaks for Complex High-dimensional[J].Computer Science,2020,47(3):79-86.
Authors:CHEN Jun-fen  ZHANG Ming  ZHAO Jia-cheng
Affiliation:(Hebei Key Laboratory of Machine Learning and Computational Intelligence,College of Mathematics and Information Sciences,Hebei University,Baoding,Hebei 071002,China)
Abstract:Unsupervised clustering in machine learning is widely applied in various object recognition tasks.A novel clustering algorithm based on density peaks(DPC)can find out cluster center points quickly in decision graph and the number of clusters.However,when dealing with the data of complex distribution shape and high-dimensional image data,there are still some problems in DPC algorithm,such as difficult to determine the cluster center points and few clusters.In order to improve its robustness in dealing with complex high-dimensional data,an improved DPC clustering algorithm(AE-MDPC)was presented,which employs an autoencoder,a kind of unsupervised learning method,to obtain the optimal feature representation from input data,and manifold similarity of pairwise data to describe the global consistence.The autoencoder can reduce feature noises via reducing dimension of the high-dimensional image data,whilst manifold distance can lead to the densities of the potential cluster centers become global peaks.AE-MDPC algorithm was compared with K-means,DBSCAN,DPC and DPC combined PCA on four artificial datasets and four real face image datasets.The experimental results demonstrate that AE-MDPC outperforms the other clustering algorithms on clustering accuracy,adjusted mutual information and adjusted rand index,meanwhile AE-MDPC provides better clustering visualization.Overall,the proposed AE-MDPC algorithm can effectively handle complex manifold data and high-dimensional image data.
Keywords:Clustering  Density peaks  DPC algorithm  Features representation  Manifold distance
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号