首页 | 官方网站   微博 | 高级检索  
     

GPU加速的改进PAM聚类算法研究与应用
引用本文:周恩波,毛善君,李梅,孙振明. GPU加速的改进PAM聚类算法研究与应用[J]. 地球信息科学学报, 2017, 19(6): 782-791. DOI: 10.3724/SP.J.1047.2017.00782
作者姓名:周恩波  毛善君  李梅  孙振明
作者单位:1. 北京大学遥感与地理信息系统研究所,北京 1008712. 中国矿业大学(北京)资源与安全工程学院,北京 100083
基金项目:国家重点研发计划重点专项(2016YFC0801800)
摘    要:空间聚类是空间数据挖掘的重要方法,而K-Medoids是一种常用的空间聚类算法。K-Medoids聚类算法存在初始点选择问题,而且计算复杂。为了提高算法的有效性和时间效率,本文结合模拟退火算法思想,改进了传统的K-Medoids算法PAM,提出一种基于GPU计算的并行模拟退火PAM算法。类比矩阵乘法运算,定义了一种新的矩阵计算方法,可以有效减少数据在GPU全局内存和共享内存之间的传输,提高了算法在GPU中的执行效率。利用模拟退火算法搜索聚类中心点,保证了聚类结果的全局最优性。基于不同的数据集,将串行和并行模拟退火PAM算法以及已有的遗传PAM算法进行比较,结果表明并行模拟退火PAM算法聚类结果正确,且时间效率高。最后,应用本文改进算法对贵州省安监系统的安全监管隐患数据进行聚类分析,发现了隐患聚集中心,相关结果对政府的决策具有一定的实际应用价值。

关 键 词:K-Medoids  模拟退火  GPU  并行计算  空间聚类分析  
收稿时间:2016-10-11

Research and Application of Accelerating Improved PAM Clustering Algorithm by GPU
ZHOU Enbo,MAO Shanjun,LI Mei,SUN Zhenming. Research and Application of Accelerating Improved PAM Clustering Algorithm by GPU[J]. Geo-information Science, 2017, 19(6): 782-791. DOI: 10.3724/SP.J.1047.2017.00782
Authors:ZHOU Enbo  MAO Shanjun  LI Mei  SUN Zhenming
Affiliation:1. Institute of Remote Sensing and Geographical Information System, Peking University, Beijing 100871, China2. Resources and Safety Engineering, China University of Mining and Technology (Beijing), Beijing 100083, China;
Abstract:Spatial clustering is one of the most important methods in spatial data mining. As a common but powerful spatial clustering algorithm, K-Medoids is applied in many fields such as generalization of spatial entity information, spatial point pattern analysis and epidemiology application. However, K-Medoids algorithm meets two main challenges innately as follow. At first, K-Medoids has selection problem of the initial medoids. Different initial medoids may not attain the same clustering results which could lead to a non-optimal results sometimes. Furthermore, time efficiency of the algorithm is not satisfactory because there exist quantities of iterations to find the most suitable partition. Existing studies on the K-Medoids algorithm don’t take the validness and time efficiency into consideration at the same time. Optimal methods like the Genetic Algorithm are applied to improve the validness of K-Medoids but the time efficiency is not acceptable when dealing with growing data. The MapReduce model is utilized to handle with data of high volume which can’t adapt to some circumstances short of computer clusters. In order to improve the result validity and time efficiency of the algorithm, this paper revised the traditional K-Medoids algorithm of Partitioning Around Medoids (PAM) combining with the idea of the Simulate Anneal Arithmetic (SAA) and proposed a parallel Simulate Anneal Partitioning Around Medoids (SAPAM) algorithm which was implemented efficiently in Graphics Processing Units (GPUs). SAA algorithm is used to search for the initial medoids which promises the validness of the algorithm. The stochastic factor introduced in SAA algorithm gives the possibility of eliminating the local optima to attain the global optimal clustering results of PAM. To accelerate the clustering process, we design the parallel SAPAM algorithm to utilize quantities of GPU’s threads which execute the program at the same time. By analogy with the matrix multiplication, a new matrix computation method is defined to reduce the time consumption of data transfer between GPU’s global memory and shared memory. The matrix computation method reuses data in the shared memory of GPU and computes the distances between medoids and many points at a time which improve the algorithm’s performance evidently. To validate the proposed algorithm, we generated eight datasets with different attributes and sizes randomly and conducted experiments on the eight datasets to compare the proposed parallel SAPAM algorithm with the traditional PAM algorithm, sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm. The experiment results showed that SAPAM algorithm attained more accurate clustering results compared with the traditional PAM and the parallel genetic K-Medoids algorithm. Besides, the proposed algorithm performed better than the sequential SAPAM algorithm and the parallel genetic K-Medoids algorithm in time efficiency. According to the results, our GPU-based SAPAM algorithm was four to eight times faster than the traditional PAM algorithm. The results demonstrate that the proposed method can execute efficiently and attain a valid result. Finally, SAPAM algorithm was applied to analyze the safety monitoring data of Guizhou province to get the clustering pattern of the safety threats. The clustering results show us several clusters of the safety threats which may provide some practical application value to the governor.
Keywords:K-Medoids  Simulate Anneal Arithmetic  GPU  Parallel Computing  Spatial Clustering Analysis  
本文献已被 CNKI 等数据库收录!
点击此处可从《地球信息科学学报》浏览原始摘要信息
点击此处可从《地球信息科学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号