首页 | 官方网站   微博 | 高级检索  
     

基于EMD距离的多示例聚类
引用本文:李展,彭进业,温超.基于EMD距离的多示例聚类[J].计算机科学,2011,38(7):235-239.
作者姓名:李展  彭进业  温超
作者单位:1. 西北大学信息科学与技术学院,西安,710069
2. 西北大学信息科学与技术学院,西安710069;西北工业大学电子信息学院,西安710072
基金项目:本文受教育部新世纪优秀人才支持计划项目(NCET-07-0693),陕西省教育厅科研项目(10JK852)资助.
摘    要:多示例学习中,包由多个示例组成,有明确标记,而示例标记却不确定。已有聚类研究都针对单示例、单标记,因而无法直接应用于多示例问题。基于推土机距离(earth mover's distance, EMD)提出了一种新的多示例聚类算法ECMIL。该方法首先利用欧式距离计算包内示例相似度,将相似示例合并;然后将需要度量距离相似性的包内示例分别看作供货者和消费者,计算货物拥有量和货物需求量;对推土机距离无法供货问题,通过增大满足条件供货者的权值加以解决;最后使用k-mcdoids算法进行聚类。在基准数据集MUSK, Corcl和SIVAI上进行实验,表明EC-MIL算法是有效的。

关 键 词:多示例聚类,推土机距离,k-medoids

Multi-instance Clustering Based on EMD
LI Zhan,PENG Jin-ye,WHEN Chao.Multi-instance Clustering Based on EMD[J].Computer Science,2011,38(7):235-239.
Authors:LI Zhan  PENG Jin-ye  WHEN Chao
Affiliation:(School of Information Science and Technology,Northwest University,Xi'an 710069,China);(School of Electronics Information,Northwestern Polytechnical University,Xi'an 710072,China)
Abstract:In the setting of multi-instance learning, each sample is represented by a bag composed of multiple instances.Previous studies on clustering mainly deal with the single instance in traditional learning setting, so it can't be applied to multi instance problem directly. In this paper, based on earth mover's distance, a novel multiplcinstance clustering algothrim named ECMKIL was presented. Firstly we calculated the bag's instances' similarity, emerged the similarity ones, then regarded the two bags' instances as suppliers and consumers, calculated the goods and capacity. To deal with the supplier-consumer imbalance problem, we solved it by multiplying the goods. Finally, used k-medoids to cluster the multi-instance data. Experimental results on MUSK, Corel and SIVAL data set indicate that the ECMKIL method is effective.
Keywords:Multi instance clustering  Earth mover's distance  K-medoids
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机科学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号