首页 | 官方网站   微博 | 高级检索  
     

K近邻和加权相似性的密度峰值聚类算法
引用本文:赵嘉,陈磊,吴润秀,张波,韩龙哲.K近邻和加权相似性的密度峰值聚类算法[J].控制理论与应用,2022,39(12):2349-2357.
作者姓名:赵嘉  陈磊  吴润秀  张波  韩龙哲
作者单位:南昌工程学院信息工程学院,南昌工程学院信息工程学院,南昌工程学院信息工程学院,全球能源互联网研究院有限公司,南昌工程学院信息工程学院
基金项目:国家自然科学基金项目(52069014, 61962036), 江西省杰出青年基金项目(2018ACB21029)资助.
摘    要:密度峰值聚类算法的局部密度定义未考虑密度分布不均数据类簇间的样本密度差异影响, 易导致误选类簇中心; 其分配策略依据欧氏距离通过密度峰值进行链式分配, 而流形数据通常有较多样本距离其密度峰值较远, 导致大量本应属于同一个类簇的样本被错误分配给其他类簇, 致使聚类精度不高. 鉴于此, 本文提出了一种K近邻和加权相似性的密度峰值聚类算法. 该算法基于样本的K近邻信息重新定义了样本局部密度, 此定义方式可以调节样本局部密度的大小, 能够准确找到密度峰值; 采用样本的共享最近邻及自然最近邻信息定义样本间的相似性, 摒弃了欧氏距离对分配策略的影响, 避免了样本分配策略产生的错误连带效应. 流形及密度分布不均数据集上的对比实验表明, 本文算法能准确找到疏密程度相差较大数据集的密度峰值, 避免了流形数据的分配错误连带效应, 得到了满意的聚类效果; 同时在真实数据集上的聚类效果也十分优秀.

关 键 词:密度峰值聚类    局部密度    K近邻    共享最近邻    自然最近邻
收稿时间:2021/8/28 0:00:00
修稿时间:2022/12/15 0:00:00

Density peaks clustering algorithm with K-nearest neighbors and weighted similarity
ZHAO Ji,CHEN Lei,WU Run-xiu,ZHANG Bo and HAN Long-zhe.Density peaks clustering algorithm with K-nearest neighbors and weighted similarity[J].Control Theory & Applications,2022,39(12):2349-2357.
Authors:ZHAO Ji  CHEN Lei  WU Run-xiu  ZHANG Bo and HAN Long-zhe
Affiliation:School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China,School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China,School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China,Global Energy Internet Research Institute Co, Ltd, Nanjing Jiangsu , China,School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China
Abstract:The local density definition of density peaks clustering algorithm does not take into account the influence of sample density difference between clusters with uneven density distribution data, which can easily lead to mistakenly select the cluster centers; the distribution strategy is chained according to the Euclidean distance through density peaks, and flow data usually has more samples farther away from their density peaks, resulting in a large number of samples that should belong to the same cluster being misallocated to other clusters, which result in poor clustering accuracy. In view of this, this paper proposes a density peaks clustering algorithm with K-nearest neighbors and weighted similarities, the local density of the sample based on the K-nearest neighbors information of the sample is redefined, which can adjust the local density of the sample and accurately find the density peaks. The shared nearest neighbors and natural nearest neighbors information of the samples are used to define the similarity between the samples, which eliminates the influence of Euclidean distance on the allocation strategy and avoids the false cascading effect of the sample allocation strategy. The comparative experiments on the uneven density distribution datasets and flow datasets show that the algorithm can accurately find the density peaks of the datasets with large difference of density, avoid the misallocation effect of flow data, and get satisfactory clustering effect. The clustering results on the real datasets is also excellent.
Keywords:density peaks clustering  local density  K-nearest neighbors  shared nearest neighbors  natural nearest neighbors
点击此处可从《控制理论与应用》浏览原始摘要信息
点击此处可从《控制理论与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号