首页 | 官方网站   微博 | 高级检索  
     

CPM-MCHM:一种基于极大团和哈希表的空间并置模式挖掘算法
引用本文:张绍雪,王丽珍,陈文和.CPM-MCHM:一种基于极大团和哈希表的空间并置模式挖掘算法[J].计算机学报,2022,45(3):526-541.
作者姓名:张绍雪  王丽珍  陈文和
作者单位:云南大学信息学院 昆明 650504
基金项目:国家自然科学基金项目(61966036,61662086);
摘    要:空间并置(co-location)模式挖掘是指在大量的空间数据中发现一组空间特征的子集,这些特征的实例在地理空间中频繁并置出现.传统的空间并置模式挖掘算法通常采用逐阶递增的挖掘框架,从低阶模式开始生成候选模式并计算其参与度(空间并置模式的频繁性度量指标).虽然这种挖掘框架可以得到正确和完整的结果,但是带来的时间和空间开销非常大.此外传统方法对于空间并置模式的最小频繁性阈值较为敏感,当最小频繁性阈值改变时整个挖掘过程需要重新进行.因此,本文提出一种基于极大团和哈希表的空间并置模式挖掘算法CPM-MCHM(Co-location Pattern Mining based on Maximal Clique and Hash Map)来发现完整并且正确的频繁空间并置模式.CPM-MCHM算法不仅避免逐阶候选-测试框架带来的巨大开销问题,还降低了算法对最小频繁性阈值的敏感.首先,采用基于位运算的分区Bron–Kerbosch算法生成给定空间数据集的所有极大团,并将其存储在哈希表中.然后,提出一种两阶段挖掘框架计算所有模式的参与度并过滤所有频繁空间并置模式.最后,在真实和合成数据集上进行了大量的对比实验.与经典的传统算法和近两年内学者提出的两种算法相比,当实验数据的规模达到20万实例数时,本文提出的CPM-MCHM算法的挖掘时间和空间耗费分别降低了90%和70%以上,当实验数据量进一步加大时CPM-MCHM算法的优势更加明显.

关 键 词:空间数据挖掘  空间并置模式  两阶段挖掘框架  极大团  哈希表

CPM-MCHM: A Spatial Co-location Pattern Mining Algorithm Based on Maximal Clique and Hash Map
ZHANG Shao-Xue,WANG Li-Zhen,TRAN Van-Ha.CPM-MCHM: A Spatial Co-location Pattern Mining Algorithm Based on Maximal Clique and Hash Map[J].Chinese Journal of Computers,2022,45(3):526-541.
Authors:ZHANG Shao-Xue  WANG Li-Zhen  TRAN Van-Ha
Affiliation:(School of Information Science and Engineering,Yunnan University,Kunming 650504)
Abstract:Spatial co-location pattern mining refers to the discovery of a set of spatial features in large spatial data sets,and instances of these features frequently appear together in the geographic space.The traditional co-location pattern mining algorithms usually employ an incremental mining framework,generating high-size candidate patterns from the low-size prevalent patterns and compute their participation index(prevalence metrics of the co-location pattern).The participation indexes of candidate patterns are compared with a minimum prevalence threshold given by users to filter prevalent co-location patterns.This iterative process continues to be executed size-by-size,all prevalent co-location patterns are generated completely.Although this mining framework can yield a correct and complete mining result,the cost of execution time and memory space is very expensive.In addition,this mining framework is very sensitive to the minimum prevalence threshold used to filtering prevalent co-location patterns.If the minimum prevalence threshold is changed,the whole mining process needs to be restarted.Since the row instances supporting the prevalence of a co-location pattern are a clique relation under the neighbor relationship,all row instances of the co-location patterns are included in the maximal cliques of the spatial data set,thus they can be used to generate all row instances of all spatial co-location patterns.Therefore,under the premise of discovery of complete and correct prevalent spatial co-location patterns,a Co-location Pattern Mining algorithm based on Maximal Clique and Hash Map(CPM-MCHM)is proposed to address the above shortcomings.The CPM-MCHM algorithm not only avoids the problem of the expensive running time posed by the size-by-size candidate-test framework,but also reduces the sensitivity of the algorithm to the minimum prevalence threshold.Firstly,a bit operation-based and partition Bron-Kerbosch algorithm is presented to enumerate all the maximal cliques of an input spatial data set.The performance of this enumeration maximal clique algorithm is examined by experiments to prove that it can quickly and accurately enumerate all maximal cliques.Secondly,these maximal cliques are compressed into the hash map,by utilizing the characteristic of the hash map to speed up collecting row instances of all co-location patterns.Finally,using the generated hash map and downward closure property of the participation index,a two-stage mining approach is proposed.The first stage generates clique candidate patterns and collects their participation instances,and then calculates the participation indexes of these candidate patterns.The second stage calculates the participation indexes of all remaining patterns based on the participation instances of features in the clique candidates of the first stage.Through the two stages,the participation indexes of all patterns can be obtained efficiently,thereby all prevalence co-location patterns can be effectively filtered.At the end of the paper,a lot of comparative experiments were conducted on real and synthetic data sets.Compared to a classical algorithm and two algorithms proposed in the past two years,when the scale of the experimental data reaches 200,000 spatial instances,the mining time and space cost of the proposed CPM-MCHM algorithm were reduced by more than 90%and 70%respectively,and the CPM-MCHM algorithm advantage of increasing the experimental data volume was more obvious.
Keywords:spatial data mining  spatial co-location pattern  two-stage mining framework  maximal clique  hash map
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号