首页 | 官方网站   微博 | 高级检索  
     

采用离群点检测技术的混合型数据聚类初始化方法
引用本文:杨志勇,江峰,于旭,杜军威.采用离群点检测技术的混合型数据聚类初始化方法[J].智能系统学报,2023,18(1):56-65.
作者姓名:杨志勇  江峰  于旭  杜军威
作者单位:青岛科技大学 信息科学技术学院,山东 青岛 266100
摘    要:近年来,混合型数据的聚类问题受到广泛关注。作为处理混合型数据的一种有效方法,K-prototype聚类算法在初始化聚类中心时通常采用随机选取的策略,然而这种策略在很多实际应用中难以保证聚类结果的质量。针对上述问题,采用基于离群点检测的策略来为K-prototype算法选择初始中心,并提出一种新的混合型数据聚类初始化算法(initialization of K-prototype clustering based on outlier detection and density, IKP-ODD)。给定一个候选对象,IKP-ODD通过计算其距离离群因子、加权密度以及与已有初始中心之间的加权距离来判断候选对象是否是一个初始中心。IKP-ODD通过采用距离离群因子和加权密度,防止选择离群点作为初始中心。在计算对象的加权密度以及对象之间的加权距离时,采用邻域粗糙集中的粒度邻域熵来计算每一个属性的重要性,并根据属性重要性的大小为不同属性赋予不同的权重,有效地反映不同属性之间的差异性。在多个UCI数据集上的实验表明,相对于现有的初始化方法,IKP-ODD能够更好地解决K-prototype聚类的初始化问题。

关 键 词:聚类初始化  混合型数据  离群点检测  邻域粗糙集  粒度邻域熵  距离离群因子  加权密度  加权距离

Mixed data clustering initialization method using outlier detection technology
YANG Zhiyong,JIANG Feng,YU Xu,DU Junwei.Mixed data clustering initialization method using outlier detection technology[J].CAAL Transactions on Intelligent Systems,2023,18(1):56-65.
Authors:YANG Zhiyong  JIANG Feng  YU Xu  DU Junwei
Affiliation:School of Information Science & Technology, Qingdao University of Science and Technology, Qingdao 266100, China
Abstract:In recent years, the clustering problem of mixed-type data has received wide attention. As an effective method to process mixed-type data, K-prototype clustering algorithm usually uses the strategy of random selection to initialize cluster centers. However, it is difficult to guarantee the quality of clustering results in many practical applications. To solve above problem, in this paper we select initial centers for K-prototype algorithm based on outlier detection, and present a new initialization algorithm (Initialization of K-prototype Clustering Based on Outlier Detection and Density, denoted as IKP-ODD) for mixed-type data clustering. Given a candidate object, IKP-ODD determines whether the candidate object is an initial center by calculating its distance outlier factor, weighted density and weighted distances from existing initial centers. IKP-ODD prevents outliers from being selected as initial centers by using distance outlier factor and weighted density. When calculating the weighted densities of objects and the weighted distances between objects, we use the granular neighborhood entropy in neighborhood rough sets to calculate the significance of each attribute, and assign different weights to different attributes according to the significances of attributes, which can effectively reflect the difference between different attributes. Experiments on several UCI datasets show that IKP-ODD performs better than the existing initialization methods when solving the initialization problem of K-prototype clustering.
Keywords:initialization of clustering  mixed-type data  outlier detection  neighborhood rough set  granular neighborhood entropy  distance outlier factor  weighted density  weighted distance
点击此处可从《智能系统学报》浏览原始摘要信息
点击此处可从《智能系统学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号