首页 | 官方网站   微博 | 高级检索  
     

知识图谱划分算法研究综述
引用本文:王鑫,陈蔚雪,杨雅君,张小旺,冯志勇.知识图谱划分算法研究综述[J].计算机学报,2021,44(1):235-260.
作者姓名:王鑫  陈蔚雪  杨雅君  张小旺  冯志勇
作者单位:天津大学智能与计算学部 天津300354;天津市认知计算与应用重点实验室 天津300354;天津大学智能与计算学部 天津300354;天津大学智能与计算学部 天津300354;天津大学智能与计算学部 天津300354
基金项目:国家自然科学基金;本课题得到国家重点研发计划项目
摘    要:知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,如何对其进行划分已成为目前知识图谱研究的热点问题.从知识图谱和图划分的定义出发,系统性地介绍当前知识图谱数据划分的各类算法,包括基本、多级、流式、分布式和其他类型图划分算法.首先,介绍4种基本图划分算法:谱划分算法、几何划分算法、分支定界算法、KL及其衍生算法,这类算法通常用于小规模图数据或作为其他划分算法的一部分;然后,介绍多级图划分算法,这类算法对图粗糙化后进行划分再投射回原始图,根据粗糙化过程分为基于匹配的算法和基于聚合的算法;其次,描述3种流式图划分算法,这类算法将顶点或边加载为序列后进行划分,包括Hash算法、贪心算法、Fennel算法,以及这3种算法的衍生算法;再次,介绍以KaPPa、JA-BE-JA和轻量级重划分为代表的分布式图划分算法及它们的衍生算法;同时,在其他类型图划分算法中,介绍近年来新兴的2种图划分算法:标签传播算法和基于查询负载的算法.通过在合成与真实知识图谱数据集上的丰富实验,比较了5类知识图谱代表性划分算法在划分效果、查询处理与图数据挖掘方面的性能差异,分析实验结果并推广到推理层面,获得了基于实验的知识图谱划分算法性能评价结论.最后,在对已有方法分析和比较的基础上,总结目前知识图谱数据划分面临的主要挑战,提出相应的研究问题,并展望未来的研究方向.

关 键 词:知识图谱  图划分  多级划分  流划分  分布式

Research on Knowledge Graph Partitioning Algorithms:A Survey
WANG Xin,CHEN Wei-Xue,YANG Ya-Jun,ZHANG Xiao-Wang,FENG Zhi-Yong.Research on Knowledge Graph Partitioning Algorithms:A Survey[J].Chinese Journal of Computers,2021,44(1):235-260.
Authors:WANG Xin  CHEN Wei-Xue  YANG Ya-Jun  ZHANG Xiao-Wang  FENG Zhi-Yong
Affiliation:(College of Intelligence and Computing,Tianjin University,Tianjin 300354;Tianjin Key Laboratory of Cognitive Computing and Application,Tianjin 300354)
Abstract:Knowledge graphs are important cornerstones of artificial intelligence,which not only have the graph structure of the general graph model but also contain rich attribute information of vertices and edges.Therefore,knowledge graphs have received widespread attention in practical tasks.Knowledge graphs can use accurate semantics to describe the various entities and their connections in the real world,where the vertices represent entities,and the edges represent connections among these entities.Knowledge graph partitioning is a fundamental task for distributed processing of large-scale knowledge graphs,and it is the essential support for a series of graph processing operations including distributed storage,query processing,reasoning,and data mining.With the increasing scale of knowledge graphs and more requirements of distributed processing,it is difficult to partition large-scale knowledge graphs,which has become a hot issue in the current research on knowledge graphs.Starting from the definitions of knowledge graphs and graph partitioning,we systematically introduce various algorithms currently available for knowledge graph partitioning,including basic graph partitioning algorithms,multilevel graph partitioning algorithms,streaming graph partitioning algorithms,distributed graph partitioning algorithms,and other types of graph partitioning algorithms.First,four basic graph partitioning algorithms are reviewed,including spectral partitioning algorithms,geometric partitioning algorithms,branch and bound algorithms,KL and its derivative algorithms.This type of algorithms usually works on small-scale graphs or as subroutines for more sophisticated algorithms.Second,two multilevel graph partitioning algorithms are introduced,one of which uses the matching-based algorithm during the coarsening phase,and the other uses the aggregation-based algorithm.After coarsening,they partition the much smaller graph and then project the result back to the original graph.Third,streaming graph partitioning algorithms are described,which load a graph as a sequence of vertices or edges and assign each element to the corresponding partitions.The streaming graph partitioning algorithms can be further divided into three subcategories:the hash algorithm,the greedy algorithm,the Fennel algorithm,and their derivative algorithms.Fourth,we introduce distributed graph partitioning algorithms for partitioning large-scale graphs in distributed environments.We choose three representative algorithms,namely the KaPPa algorithm,the JA-BE-JA algorithm,the lightweight repartitioning algorithm,and then describe derivative algorithms of these distributed algorithms.Meanwhile,as the other types of graph partitioning algorithms,we present two recent graph partitioning algorithms,one is the label propagation algorithm,and the other is the query workload-based algorithm.By conducting extensive experiments on both synthetic and real knowledge graph datasets,we compare the performance of five representative knowledge graph partitioning algorithms in partitioning effects,query processing,and graph data mining.We analyze the experimental results in detail and draw general conclusions,and extend them to the reasoning tasks.The performance evaluation conclusions of knowledge graph partitioning algorithms are obtained based on these experiments and analyses.Finally,based on the analysis and comparison of existing methods,we summarize the main challenges in the current research on knowledge graph partitioning,propose the corresponding issues that can be investigated,and look forward to the future research directions on this topic.
Keywords:knowledge graph  graph partitioning  multilevel partitioning  streaming partitioning  distribution
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号