首页 | 官方网站   微博 | 高级检索  
     

融合单纯形映射与熵加权的聚类方法
引用本文:安宁,江思源,唐晨,杨矫云.融合单纯形映射与熵加权的聚类方法[J].计算机工程与应用,2020,56(9):148-155.
作者姓名:安宁  江思源  唐晨  杨矫云
作者单位:1.合肥工业大学 国家智慧养老国际科技合作基地,合肥 230601 2.合肥工业大学 计算机与信息学院,合肥 230601
基金项目:国家高等学校学科创新引智计划;安徽省重点研究;开发计划
摘    要:由于分类型和数值型属性特性的差异,设计混合类型数据聚类算法时通常需要对两种类型属性区别对待,增加了聚类算法的设计与实现难度。另外,不同属性所包含的信息量存在差异,但现有算法通常平等对待各个属性。提出了一种融合单纯形映射与信息熵加权的混合类型数据聚类算法。基于单纯形理论将分类型属性映射为高维数值属性向量,应用信息熵理论为各属性分配权重建立相似性度量公式,将该度量方法应用于K-Means算法框架得到聚类算法。在6个UCI的混合数据集上的实验表明,提出的聚类算法优于传统映射聚类算法和K-Prototype算法,在准确度上分别提高了2.70%和18.33%。

关 键 词:向量映射  熵加权  相似性度量  混合数据集  聚类分析  

Clustering Method by Combining Simplex Mapping and Entropy Weighting
AN Ning,JIANG Siyuan,TANG Chen,YANG Jiaoyun.Clustering Method by Combining Simplex Mapping and Entropy Weighting[J].Computer Engineering and Applications,2020,56(9):148-155.
Authors:AN Ning  JIANG Siyuan  TANG Chen  YANG Jiaoyun
Affiliation:1.National Smart Eldercare International S&T Cooperation Base, Hefei University of Technology, Hefei 230601, China 2.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China
Abstract:Due to the differences between categorical attributes and numerical attributes, researchers usually need to deal with these two types of attributes differently when designing clustering methods for mixed datasets. This increases the difficulty of designing and implementing clustering methods. Besides, the information contained in different attributes varies a lot, however, current methods treat different attributes equally. This paper proposes a weighted simplex-based mapping method for mixed data clustering. It maps the categorical attributes into high dimensional numerical attributes based on simplex theory, applies entropy theory to weight different attributes to establish the similarity measurement. The measurement is integrated with K-Means framework to form a clustering method. The experiments on 6 UCI mixed datasets show that the proposed method outperforms traditional mapping method and K-Prototype method, with 2.70% and18.33% improvement in terms of accuracy.
Keywords:vector mapping  entropy-based weight  similarity measurement  mixed datasets  clustering analysis
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号