K近邻和加权相似性的密度峰值聚类算法 Density peaks clustering algorithm with K-nearest neighbors and weighted similarity期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

K近邻和加权相似性的密度峰值聚类算法

引用本文：	赵嘉,陈磊,吴润秀,张波,韩龙哲.K近邻和加权相似性的密度峰值聚类算法[J].控制理论与应用,2022,39(12):2349-2357.

作者姓名：	赵嘉陈磊吴润秀张波韩龙哲

作者单位：	南昌工程学院信息工程学院,南昌工程学院信息工程学院,南昌工程学院信息工程学院,全球能源互联网研究院有限公司,南昌工程学院信息工程学院

基金项目：	国家自然科学基金项目(52069014, 61962036), 江西省杰出青年基金项目(2018ACB21029)资助.

摘要：	密度峰值聚类算法的局部密度定义未考虑密度分布不均数据类簇间的样本密度差异影响, 易导致误选类簇中心; 其分配策略依据欧氏距离通过密度峰值进行链式分配, 而流形数据通常有较多样本距离其密度峰值较远, 导致大量本应属于同一个类簇的样本被错误分配给其他类簇, 致使聚类精度不高. 鉴于此, 本文提出了一种K近邻和加权相似性的密度峰值聚类算法. 该算法基于样本的K近邻信息重新定义了样本局部密度, 此定义方式可以调节样本局部密度的大小, 能够准确找到密度峰值; 采用样本的共享最近邻及自然最近邻信息定义样本间的相似性, 摒弃了欧氏距离对分配策略的影响, 避免了样本分配策略产生的错误连带效应. 流形及密度分布不均数据集上的对比实验表明, 本文算法能准确找到疏密程度相差较大数据集的密度峰值, 避免了流形数据的分配错误连带效应, 得到了满意的聚类效果; 同时在真实数据集上的聚类效果也十分优秀.
关键词：	密度峰值聚类局部密度 K近邻共享最近邻自然最近邻
收稿时间：	2021/8/28 0:00:00
修稿时间：	2022/12/15 0:00:00
Density peaks clustering algorithm with K-nearest neighbors and weighted similarity

ZHAO Ji,CHEN Lei,WU Run-xiu,ZHANG Bo and HAN Long-zhe.Density peaks clustering algorithm with K-nearest neighbors and weighted similarity[J].Control Theory & Applications,2022,39(12):2349-2357.

Authors:	ZHAO Ji CHEN Lei WU Run-xiu ZHANG Bo and HAN Long-zhe

Affiliation:	School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China,School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China,School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China,Global Energy Internet Research Institute Co, Ltd, Nanjing Jiangsu , China,School of Information Engineering, Nanchang Institute of Technology, Nanchang Jiangxi , China

Abstract:	The local density definition of density peaks clustering algorithm does not take into account the influence of sample density difference between clusters with uneven density distribution data, which can easily lead to mistakenly select the cluster centers; the distribution strategy is chained according to the Euclidean distance through density peaks, and flow data usually has more samples farther away from their density peaks, resulting in a large number of samples that should belong to the same cluster being misallocated to other clusters, which result in poor clustering accuracy. In view of this, this paper proposes a density peaks clustering algorithm with K-nearest neighbors and weighted similarities, the local density of the sample based on the K-nearest neighbors information of the sample is redefined, which can adjust the local density of the sample and accurately find the density peaks. The shared nearest neighbors and natural nearest neighbors information of the samples are used to define the similarity between the samples, which eliminates the influence of Euclidean distance on the allocation strategy and avoids the false cascading effect of the sample allocation strategy. The comparative experiments on the uneven density distribution datasets and flow datasets show that the algorithm can accurately find the density peaks of the datasets with large difference of density, avoid the misallocation effect of flow data, and get satisfactory clustering effect. The clustering results on the real datasets is also excellent.

Keywords:	density peaks clustering local density K-nearest neighbors shared nearest neighbors natural nearest neighbors

	点击此处可从《控制理论与应用》浏览原始摘要信息
	点击此处可从《控制理论与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏