首页 | 官方网站   微博 | 高级检索  
     

识别聚类间远近关系的双几何体模型
引用本文:王开军,严宣辉,陈黎飞.识别聚类间远近关系的双几何体模型[J].中国科学:信息科学,2012(1):99-110.
作者姓名:王开军  严宣辉  陈黎飞
作者单位:福建师范大学数学与计算机科学学院
基金项目:福建省教育厅A类资助项目(批准号:JA09043);福建省省属高校科研专项(批准号:JK2009006)资助项目
摘    要:许多实际问题的解决不仅需要聚类算法给出类标,更依赖于类间远近关系的辨别.对于类数较多且高维数据的困难情况,基于降维的聚类结果可视化方法通常会出现聚类的重叠、交织或强行拉远现象,使得一些类间的远近关系无法分辨或被错误显示;而现有的类间距离方法则不能揭示两个聚类是远离还是靠近.本文提出了双几何体模型方法来描述两个聚类的类间关系,并设计了相对边界距离、绝对边界距离和区域疏密程度等测量类间远近程度的方法.本文方法既考虑了两个聚类的最近样本集之间的绝对距离,也考虑了聚类边界区域的疏密程度,其优点是在上述困难情况下也能准确揭示高维空间中的类间关系.对真实数据集的实验结果表明,双几何体模型方法能有效地识别现有聚类可视化方法无法辨别的类间远近关系.

关 键 词:双几何体模型  聚类间远近关系  大类数  高维数据  划分聚类算法

Geometric double-entity model for recognizing far-near relations of clusters
WANG KaiJun,YAN XuanHui & CHEN LiFei.Geometric double-entity model for recognizing far-near relations of clusters[J].Scientia Sinica Informationis,2012(1):99-110.
Authors:WANG KaiJun  YAN XuanHui & CHEN LiFei
Affiliation:School of Mathematics and Computer Science,Fujian Normal University,Fuzhou 350108,China
Abstract:When solving many practical problems,we not only need sample labels given by a clustering algorithm,but also rely on the recognition of far-near relations of clusters.Under the diffcult condition of many clusters in a high-dimensional data set,the clustering visualization methods based on dimension reductions usually produce the phenomena,e.g.,some clusters are overlapping,interlacing,or pushed away;as a result,the far-near relations of some clusters are displayed wrongly or cannot be distinguished.The existing inter-cluster distance methods cannot determine whether two clusters are far away or near.The geometric double-entitymodel method(GDEM) is proposed to describe far-near relations of clusters,and the methods such as the relative border distance,absolute border distance and region dense degree are designed to measure far-near degrees between clusters.GDEM pays attention to both the absolute distance between nearest sample sets and the dense degrees of border regions of two clusters,and it is able to uncover accurately far-near relations of clusters in a high-dimensional space,especially under the diffcult condition mentioned above.The experimental results on four real data sets show that the proposed method can effectively recognize far-near relations of clusters,while the conventional methods cannot.
Keywords:geometric double-entity model  far-near relations of clusters  many clusters  high-dimensional dataset  partitional clustering algorithms
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号