首页 | 官方网站   微博 | 高级检索  
     

大规模数据集引力同步聚类
引用本文:乔颖,王士同,杭文龙.大规模数据集引力同步聚类[J].控制与决策,2017,32(6):1075-1083.
作者姓名:乔颖  王士同  杭文龙
作者单位:江南大学数字媒体学院,江苏无锡214122,江南大学数字媒体学院,江苏无锡214122,江南大学数字媒体学院,江苏无锡214122
基金项目:国家自然科学基金项目(61272210,61170122);江苏省自然科学基金项目(BK20130155).
摘    要:受Kuramoto模型启发,构造一种新的万有引力同步模型,用以解决现有同步聚类算法时间复杂度高的问题,并提出大规模数据集的引力同步聚类算法(LSCGS).首先,使用快速压缩集密度估计(RSDE)算法对大规模数据集进行压缩;然后,通过万有引力同步聚类算法对压缩数据集进行聚类,使用Davies-Bouldin指标自动寻优到最佳聚类数;最后,利用提出的剩余样本聚类(RSC)算法对除压缩集以外的剩余数据进行聚类,可以有效地区分孤立类以及噪声点.通过在大规模人造数据集、UCI真实数据集和图像数据上的实验,验证LSCGS算法的有效性,与传统同步聚类算法相比,聚类的运算成本得到大幅度的降低.

关 键 词:大规模数据  快速压缩集密度估计  万有引力  同步聚类

Clustering by gravitational synchronization on large scale dataset
QIAO Ying,WANG Shi-tong and HANG Wen-long.Clustering by gravitational synchronization on large scale dataset[J].Control and Decision,2017,32(6):1075-1083.
Authors:QIAO Ying  WANG Shi-tong and HANG Wen-long
Affiliation:School of Digital Media,Jiangnan University,Wuxi214122,China,School of Digital Media,Jiangnan University,Wuxi214122,China and School of Digital Media,Jiangnan University,Wuxi214122,China
Abstract:Different from the existing synchronization clustering algorithm(Sync) which is recently proposed based on Kuramoto model in physics, and referring to gravitational law, a novel clustering algorithm, called large sample clustering by gravitational synchronization(LSCGS) is proposed for large datasets. Firstly, a large scale dataset is condensed into its reduced dataset by using the reduced set density estimator method. Then, the obtained reduced dataset is clustered by using the proposed gravitational synchronization clustering model with Davies-Bouldin clustering criterion to find out the most suitable clustering results. Finally, the remaining samples in the large dataset are clustered. The proposed method can detect clusters in data of arbitrary shapes, sizes and numbers without any data distribution assumptions. Extensive experiments on the large synthetic dataset, UCI real-world datasets and image segmentations indicate that LSCGS can effectively detect the clusters of the arbitrary shape, and the proposed method achieves high clustering accuracy with lower execution time.
Keywords:
点击此处可从《控制与决策》浏览原始摘要信息
点击此处可从《控制与决策》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号