采用离群点检测技术的混合型数据聚类初始化方法 Mixed data clustering initialization method using outlier detection technology期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

采用离群点检测技术的混合型数据聚类初始化方法

引用本文：	杨志勇,江峰,于旭,杜军威.采用离群点检测技术的混合型数据聚类初始化方法[J].智能系统学报,2023,18(1):56-65.

作者姓名：	杨志勇江峰于旭杜军威

作者单位：	青岛科技大学信息科学技术学院，山东青岛 266100

摘要：	近年来，混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法，K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略，然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题，采用基于离群点检测的策略来为K-prototype算法选择初始中心，并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density, IKP-ODD)。给定一个候选对象，IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度，防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时，采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性，并根据属性重要性的大小为不同属性赋予不同的权重，有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明，相对于现有的初始化方法，IKP-ODD能够更好地解决K-prototype聚类的初始化问题。
关键词：	聚类初始化混合型数据离群点检测邻域粗糙集粒度邻域熵距离离群因子加权密度加权距离
Mixed data clustering initialization method using outlier detection technology

YANG Zhiyong,JIANG Feng,YU Xu,DU Junwei.Mixed data clustering initialization method using outlier detection technology[J].CAAL Transactions on Intelligent Systems,2023,18(1):56-65.

Authors:	YANG Zhiyong JIANG Feng YU Xu DU Junwei

Affiliation:	School of Information Science & Technology, Qingdao University of Science and Technology, Qingdao 266100, China

Abstract:	In recent years, the clustering problem of mixed-type data has received wide attention. As an effective method to process mixed-type data, K-prototype clustering algorithm usually uses the strategy of random selection to initialize cluster centers. However, it is difficult to guarantee the quality of clustering results in many practical applications. To solve above problem, in this paper we select initial centers for K-prototype algorithm based on outlier detection, and present a new initialization algorithm (Initialization of K-prototype Clustering Based on Outlier Detection and Density, denoted as IKP-ODD) for mixed-type data clustering. Given a candidate object, IKP-ODD determines whether the candidate object is an initial center by calculating its distance outlier factor, weighted density and weighted distances from existing initial centers. IKP-ODD prevents outliers from being selected as initial centers by using distance outlier factor and weighted density. When calculating the weighted densities of objects and the weighted distances between objects, we use the granular neighborhood entropy in neighborhood rough sets to calculate the significance of each attribute, and assign different weights to different attributes according to the significances of attributes, which can effectively reflect the difference between different attributes. Experiments on several UCI datasets show that IKP-ODD performs better than the existing initialization methods when solving the initialization problem of K-prototype clustering.

Keywords:	initialization of clustering mixed-type data outlier detection neighborhood rough set granular neighborhood entropy distance outlier factor weighted density weighted distance

	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏