期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient parallel edge-centric approach for relaxed graph pattern matching

Bouhenni Sarra Yahiaoui Saïd Nouali-Taboudjemat Nadia Kheddouci Hamamache 《The Journal of supercomputing》2022,78(2):1642-1671

Prior algorithms on graph simulation for distributed graphs are not scalable enough as they exhibit heavy message passing. Moreover, they are dependent on the graph partitioning quality that can be a bottleneck due to the natural skew present in real-world data. As a result, their degree of parallelism becomes limited. In this paper, we propose an efficient parallel edge-centric approach for distributed graph pattern matching. We design a novel distributed data structure called ST that allows a fine-grain parallelism, and hence guarantees linear scalability. Based on ST, we develop a parallel graph simulation algorithm called PGSim. Furthermore, we propose PDSim, an edge-centric algorithm that efficiently evaluates dual simulation in parallel. PDSim combines ST and PGSim in a Split-and-Combine approach to accelerate the computation stages. We prove the effectiveness and efficiency of these propositions through theoretical guarantees and extensive experiments on massive graphs. The achieved results confirm that our approach outperforms existing algorithms by more than an order of magnitude.

相似文献

2.

Multi-level graph layout on the GPU

Frishman Y Tal A 《IEEE transactions on visualization and computer graphics》2007,13(6):1310-1319

This paper presents a new algorithm for force directed graph layout on the GPU. The algorithm, whose goal is to compute layouts accurately and quickly, has two contributions. The first contribution is proposing a general multi-level scheme, which is based on spectral partitioning. The second contribution is computing the layout on the GPU. Since the GPU requires a data parallel programming model, the challenge is devising a mapping of a naturally unstructured graph into a well-partitioned structured one. This is done by computing a balanced partitioning of a general graph. This algorithm provides a general multi-level scheme, which has the potential to be used not only for computation on the GPU, but also on emerging multi-core architectures. The algorithm manages to compute high quality layouts of large graphs in a fraction of the time required by existing algorithms of similar quality. An application for visualization of the topologies of ISP (Internet Service Provider) networks is presented. 相似文献

3.

Optimal broadcasting on the star graph 总被引：2，自引：0，他引：2

Mendia V.E. Sarkar D. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(4):389-396

The star graph has been show to be an attractive alternative to the widely used n-cube. Like the n-cube, the star graph possesses rich structure and symmetry as well as fault tolerant capabilities, but has a smaller diameter and degree. However, very few algorithms exists to show its potential as a multiprocessor interconnection network. Many fast and efficient parallel algorithms require broadcasting as a basic step. An optimal algorithm for one-to-all broadcasting in the star graph is proposed. The algorithm can broadcast a message to N processors in O(log₂ N) time. The algorithm exploits the rich structure of the star graph and works by recursively partitioning the original star graph into smaller star graphs. In addition, an optimal all-to-all broadcasting algorithm is developed 相似文献

4.

A communication-reduced and computation-balanced framework for fast graph computation

Yongli CHENG Fang WANG Hong JIANG Yu HUA Dan FENG Lingling ZHANG Jun ZHOU 《Frontiers of Computer Science》2018,12(5):887-907

The bulk synchronous parallel (BSP) model is very user friendly for coding and debugging parallel graph algorithms. However, existing BSP-based distributed graph-processing frameworks, such as Pregel, GPS and Giraph, routinely suffer from high communication costs. These high communication costs mainly stem from the fine-grained message-passing communication model. In order to address this problem, we propose a new computation model with low communication costs, called LCC-BSP. We use this model to design and implement a high-performance distributed graph-processing framework called LCC-Graph. This framework eliminates high communication costs in existing distributed graph-processing frameworks. Moreover, LCC-Graph also balances the computation workloads among all compute nodes by optimizing graph partitioning, significantly reducing the computation time for each superstep. Evaluation of LCC-Graph on a 32-node cluster, driven by real-world graph datasets, shows that it significantly outperforms existing distributed graph-processing frameworks in terms of runtime, particularly when the system is supported by a high-bandwidth network. For example, LCC-Graph achieves an order of magnitude performance improvement over GPS and GraphLab. 相似文献

5.

Efficient parallel algorithms for graph problems

Clyde P. Kruskal Larry Rudolph Marc Snir 《Algorithmica》1990,5(1-4):43-64

We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and “unicycular graphs.” Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs. 相似文献

6.

半监督免疫克隆选择图划分方法

刘汉强《计算机工程与应用》2014,50(22):11-16

在聚类过程中利用一定量先验信息会显著提高聚类算法的性能。为了解决求解图谱划分方法NP难的问题并合理地利用一定量的先验信息,将成对限制信息引入到图谱划分方法中样本点的相似性测度,并在获得的相应的相似性矩阵的基础上,利用免疫克隆选择优化方法来优化图谱划分准则,提出了半监督免疫克隆选择图划分方法。USPS手写体数字集和UMIST人脸数据集识别的仿真实验证明了新方法的有效性。相似文献

7.

Parallel incremental graph partitioning

Chao-Wei Ou Ranka S. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(8):884-896

Partitioning graphs into equally large groups of nodes while minimizing the number of edges between different groups is an extremely important problem in parallel computing. For instance, efficiently parallelizing several scientific and engineering applications requires the partitioning of data or tasks among processors such that the computational load on each node is roughly the same, while communication is minimized. Obtaining exact solutions is computationally intractable, since graph partitioning is NP-complete. For a large class of irregular and adaptive data parallel applications (such as adaptive graphs), the computational structure changes from one phase to another in an incremental fashion. In incremental graph-partitioning problems the partitioning of the graph needs to be updated as the graph changes over time; a small number of nodes or edges may be added or deleted at any given instant. In this paper, we use a linear programming-based method to solve the incremental graph-partitioning problem. All the steps used by our method are inherently parallel and hence our approach can be easily parallelized. By using an initial solution for the graph partitions derived from recursive spectral bisection-based methods, our methods can achieve repartitioning at considerably lower cost than can be obtained by applying recursive spectral bisection. Further, the quality of the partitioning achieved is comparable to that achieved by applying recursive spectral bisection to the incremental graphs from scratch 相似文献

8.

Two-stage m-way graph partitioning

H. B. Zhou 《Parallel Computing》1993,19(12):1359-1373

This paper presents a group of multiple-way graph (with weighted nodes and edges) partitioning algorithms based on a 2-stage constructive-and-refinement mechanism. The graph partitions can be used to control allocation of program units to distributed processors in a way that minimizes the completion time and for design automation applications. In the constructive stage, 4 clustering algorithms are used to construct raw partitions, the second refinement step first adjusts the cluster number to the processor number and then iteratively improves the partitioning cost by employing a Kernighan-Lin based heuristic. This approach represents several extensions to the state-of-the-art methods. A performance comparison of the proposed algorithms is given, based on experiment results. 相似文献

9.

Partitioning dynamic graph asynchronously with distributed FENNEL

《Future Generation Computer Systems》2017

Graph partitioning is important in distributed graph processing. Classical method such as METIS works well on relatively small graphs, but hard to scale for huge, dynamic graphs. Streaming graph partitioning algorithms overcome this issue by processing those graphs as streams. Among these algorithms, FENNEL achieves better edge cut ratio, even close to METIS, but consumes less memory and is significantly faster. On the other hand, graph partitioning may also benefit from distributed graph processing. However, to deploy FENNEL on a cluster, it is important to avoid quality loss and keep efficiency high. The direct implementation of this idea yields a synchronous model and a star-shaped network, which limits both scalability and efficiency. Targeting these two problems, we propose an asynchronous model, combined with a dedicated tree-shaped map-reduce network which is prevail in systems such as Apache Hadoop and Spark, to form AsyncFENNEL (asynchronous FENNEL). We theoretically prove that, the impact on partition quality brought by asynchronous model can be kept as minimal. We test AsyncFENNEL with various synthetic and real-world graphs, the comparison between synchronous and asynchronous models shows that, for streamed natural graphs, AsyncFENNEL can improve performance significantly (above 300% with 8 workers/partitions) with negligible loss on edge cut ratio. However, more worker nodes will introduce a heavier network traffic and reduce efficiency. The proposed tree-shaped map-reduce network can mitigate that impact and increase the performance in that case. 相似文献

10.

一种基于GN算法的动态图划分方法

罗晓霞王佳罗香玉李嘉楠《计算机工程与科学》2022,44(2):306-311

随着图规模的急剧增长,对动态图进行实时处理的需求日益增加.大多现有的算法针对静态图划分是有效的,直接用其处理动态图会带来较大的通信开销.针对该问题,提出一种基于GN算法的动态图划分方法.首先收集一段时间内加入动态图中的顶点;然后,利用GN算法对这些新加入的顶点进行预划分,产生若干个内部联系紧密的社区;最后,将预划分产生... 相似文献

11.

Fast parallel algorithms for graph similarity and matching

Giorgos Kollias Madan Sathe Olaf Schenk Ananth Grama 《Journal of Parallel and Distributed Computing》2014

This paper addresses the problem of global graph alignment on supercomputer-class clusters. We define the alignment of two graphs, as a mapping of each vertex in the first graph to a unique vertex in the second graph so as to optimize a given similarity-based cost function.¹ Using a state of the art serial algorithm for the computation of vertex similarity scores called Network Similarity Decomposition (NSD), we derive corresponding parallel formulations. Coupling this parallel similarity algorithm with a parallel auction-based bipartite matching technique, we obtain a highly efficient and scalable graph matching pipeline. We validate the performance of our integrated approach on a large parallel platform and on diverse graph instances (including Protein Interaction, Wikipedia and Web networks). Experimental results demonstrate that our algorithms scale to large machine configurations (thousands of cores) and problem instances, enabling the alignment of networks of sizes two orders of magnitude larger than reported in the current literature. 相似文献

12.

基于广度优先遍历加权图生成的启发式图分区

下载免费PDF全文

蹇冬宇程永利《计算机系统应用》2023,32(12):218-223

图分区质量极大程度上影响着计算机之间的通信开销和负载平衡, 这对于大规模并行图计算的性能是至关重要的. 然而, 随着图数据规模的越来越大, 图分区算法的执行时间成了一个不可避免的问题. 因此, 研究如何优化图分区算法的执行效率是有必要的. 本文提出了一个基于广度优先遍历加权图生成的启发式图分割方法, 该方法在实现较低的通信代价和较好负载平衡的同时, 只引入了少量的预处理时间开销. 实验结果表明, 本文的划分方法减少了复制因子, 降低通信开销, 并且引入的时间开销较小. 相似文献

13.

知识图谱划分算法研究综述 总被引：6，自引：0，他引：6

王鑫陈蔚雪杨雅君张小旺冯志勇《计算机学报》2021,44(1):235-260

知识图谱是人工智能的重要基石,因其包含丰富的图结构和属性信息而受到广泛关注.知识图谱可以精确语义描述现实世界中的各种实体及其联系,其中顶点表示实体,边表示实体间的联系.知识图谱划分是大规模知识图谱分布式处理的首要工作,对知识图谱分布式存储、查询、推理和挖掘起基础支撑作用.随着知识图谱数据规模及分布式处理需求的不断增长,如何对其进行划分已成为目前知识图谱研究的热点问题.从知识图谱和图划分的定义出发,系统性地介绍当前知识图谱数据划分的各类算法,包括基本、多级、流式、分布式和其他类型图划分算法.首先,介绍4种基本图划分算法:谱划分算法、几何划分算法、分支定界算法、KL及其衍生算法,这类算法通常用于小规模图数据或作为其他划分算法的一部分;然后,介绍多级图划分算法,这类算法对图粗糙化后进行划分再投射回原始图,根据粗糙化过程分为基于匹配的算法和基于聚合的算法;其次,描述3种流式图划分算法,这类算法将顶点或边加载为序列后进行划分,包括Hash算法、贪心算法、Fennel算法,以及这3种算法的衍生算法;再次,介绍以KaPPa、JA-BE-JA和轻量级重划分为代表的分布式图划分算法及它们的衍生算法;同时,在其他类型图划分算法中,介绍近年来新兴的2种图划分算法:标签传播算法和基于查询负载的算法.通过在合成与真实知识图谱数据集上的丰富实验,比较了5类知识图谱代表性划分算法在划分效果、查询处理与图数据挖掘方面的性能差异,分析实验结果并推广到推理层面,获得了基于实验的知识图谱划分算法性能评价结论.最后,在对已有方法分析和比较的基础上,总结目前知识图谱数据划分面临的主要挑战,提出相应的研究问题,并展望未来的研究方向. 相似文献

14.

双目标优化的RDF图分割算法

陈志奎冷泳林《计算机工程与应用》2017,53(21):24-31

分布式存储是解决大规模数据存储的一种比较有效的方法,而数据分割是实现分布式存储的前提。面对不断增长的RDF数据,提出一种基于双目标优化的RDF图分割算法（RDF Graph Partitioning algorithm based on Double Objective Optimization,RGPDOO）。RGPDOO将边割和分割平衡两项图分割指标融合到一个目标函数,并依据此目标函数,实现了RDF图的静态和动态分割。其中静态图分割通过对图进行初始划分,将图中顶点分成内核顶点、交叉顶点和自由顶点三类。然后通过计算目标函数增益对交叉和自由顶点进行分配。动态图分割部分,针对RDF元组的插入和删除给出相应的解决方案。同时,为了满足图分割目标,算法每隔一段时间[T]会根据子图的平衡性和紧密性进行一次动态调整。实验选择合成和真实数据集进行测试,并分别与几种通用的静态和动态图分割算法进行比较。实验结果表明提出的算法能够有效地实现RDF图的静态和动态分割。相似文献

15.

A parallel algorithm for the enumeration of the spanning trees of a graph

Shao-Wen Mai D.J. Evans 《Parallel Computing》1984,1(3-4):275-286

As is well known, the strategy of divide-and-conquer is widely used in problem solving. The method of partitioning is also a fundamental strategy for the design of a parallel algorithm. The problem of enumerating the spanning trees of a graph arises in several contexts such as computer-aided design and computer networks. A parallel algorithm for solving the problem is presented in this paper. It is based on the principle of the inclusion and exclusion of sets, and not directly based on the partitioning of the graph itself. The results of the preliminary experiments on a MIMD system appear promising. 相似文献

16.

Efficient parallel algorithms for graph problems 总被引：1，自引：0，他引：1

Clyde P. Kruskal Larry Rudolph Marc Snir 《Algorithmica》1990,5(1):43-64

We present an efficient technique for parallel manipulation of data structures that avoids memory access conflicts. That is, this technique works on the Exclusive Read/Exclusive Write (EREW) model of computation, which is the weakest shared memory, MIMD machine model. It is used in a new parallel radix sort algorithm that is optimal for keys whose values are over a small range. Using the radix sort and known results for parallel prefix on linked lists, we develop parallel algorithms that efficiently solve various computations on trees and unicycular graphs. Finally, we develop parallel algorithms for connected components, spanning trees, minimum spanning trees, and other graph problems. All of the graph algorithms achieve linear speedup for all but the sparsest graphs.Part of this work was done while the first author was at the University of Illinois, Urbana-Champaign, the second author was at Carnegie-Mellon University, and the third author was at the Hebrew University and the Courant Institute of Mathematical Sciences, New York University. A preliminary version of this work was presented at the 1986 International Conference on Parallel Processing. 相似文献

17.

Geometric crossovers for multiway graph partitioning 总被引：1，自引：0，他引：1

Moraglio A Kim YH Yoon Y Moon BR 《Evolutionary computation》2007,15(4):445-474

Geometric crossover is a representation-independent generalization of the traditional crossover defined using the distance of the solution space. By choosing a distance firmly rooted in the syntax of the solution representation as a basis for geometric crossover, one can design new crossovers for any representation. Using a distance tailored to the problem at hand, the formal definition of geometric crossover allows us to design new problem-specific crossovers that embed problem-knowledge in the search. The standard encoding for multiway graph partitioning is highly redundant: each solution has a number of representations, one for each way of labeling the represented partition. Traditional crossover does not perform well on redundant encodings. We propose a new geometric crossover for graph partitioning based on a labeling-independent distance that filters out the redundancy of the encoding. A correlation analysis of the fitness landscape based on this distance shows that it is well suited to graph partitioning. A second difficulty with designing a crossover for multiway graph partitioning is that of feasibility: in general recombining feasible partitions does not lead to feasible offspring partitions. We design a new geometric crossover for permutations with repetitions that naturally suits partition problems and we test it on the graph partitioning problem. We then combine it with the labeling-independent crossover and obtain a much superior geometric crossover inheriting both advantages. 相似文献

18.

Parallel construction of multidimensional binary search trees

Al-Furajh I. Aluru S. Goil S. Ranka S. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(2):136-148

Multidimensional binary search tree (abbreviated k-d tree) is a popular data structure for the organization and manipulation of spatial data. The data structure is useful in several applications including graph partitioning, hierarchical applications such as molecular dynamics and n-body simulations, and databases. In this paper, we study efficient parallel construction of k-d trees on coarse-grained distributed memory parallel computers. We consider several algorithms for parallel k-d tree construction and analyze them theoretically and experimentally, with a view towards identifying the algorithms that are practically efficient. We have carried out detailed implementations of all the algorithms discussed on the CM-5 and report on experimental results 相似文献

19.

Tuning the granularity of parallelism for distributed graph processing

Xinyuan Luo Sai Wu Wei Wang Lidan Shou 《Distributed and Parallel Databases》2017,35(2):117-148

Popular distributed graph processing frameworks, such as Pregel and GraphLab, are based on the vertex-centric computation model, where users write their customized Compute function for each vertex to process the data iteratively. Vertices are evenly partitioned among the compute nodes, and the granularity of parallelism of the graph algorithm is normally tuned by adjusting the number of compute nodes. Vertex-centric model splits the computation into phases. Inside one specific phase, the computation proceeds as an embarrassingly parallel process, because no communication between compute nodes incurs. By default, current graph engine only handles one iteration of the algorithm in a phase. However, in this paper, we find that we can also tune the granularity of parallelism, by aggregating the computation of multiple iterations into one phase, which has a significant impact on the performance of the graph algorithm. In the ideal case, if all computations are handled in one phase, the whole algorithm turns into an embarrassingly parallel algorithm and the benefit of parallelism is maximized. Based on this observation, we propose two approaches, a function-based approach and a parameter-based approach, to automatically transform a Pregel algorithm into a new one with tunable granularity of parallelism. We study the cost of such transformation and the trade-off between the granularity of parallelism and the performance. We provide a new direction to tune the performance of parallel algorithms. Finally, the approaches are implemented in our graph processing system, N2, and we illustrate their performance using popular graph algorithms. 相似文献

20.

MapReduce in MPI for Large-scale graph algorithms 总被引：1，自引：0，他引：1

Steven J. PlimptonKaren D. Devine 《Parallel Computing》2011,37(9):610-632

相似文献