面向异构架构的传递闭包并行算法 Parallel Transitive Closure Algorithm for Heterogeneous Architecture期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向异构架构的传递闭包并行算法

引用本文：	肖汉,郭宝云,李彩林,周清雷.面向异构架构的传递闭包并行算法[J].计算机工程,2021,47(8):131-139.

作者姓名：	肖汉郭宝云李彩林周清雷

作者单位：	1. 郑州师范学院信息科学与技术学院, 郑州 450044;2. 山东理工大学建筑工程学院, 山东淄博 255000;3. 郑州大学信息工程学院, 郑州 450001

基金项目：	国家自然科学基金（41601496，41701525，61572444）；山东省自然科学基金（ZR2017LD002）；山东省重点研发计划项目（2018GGX106002）。

摘要：	传统求图传递闭包的方法存在计算量大与计算时间长的问题。为加快处理大数据量的传递闭包算法的计算速度，结合算法密集计算和开放式计算语言（OpenCL）框架的特征，采用本地存储器优化的并行子矩阵乘和分块的矩阵乘并行计算，提出一种基于OpenCL的传递闭包并行算法。利用本地存储器优化的并行子矩阵乘算法来优化计算步骤，提高图形处理器（GPU）的存储器利用率，降低数据获取延迟。通过分块矩阵乘并行计算算法实现大数据量的矩阵乘，提高GPU计算核心的利用率。数据结果表明，与CPU串行算法、基于开放多处理的并行算法和基于统一设备计算架构的并行算法相比，传递闭包并行算法在OpenCL架构下NVIDIA GeForce GTX 1070计算平台上分别获得了593.14倍、208.62倍和1.05倍的加速比。
关键词：	矩阵乘传递闭包图形处理器开放式计算语言并行算法
收稿时间：	2020-04-15
修稿时间：	2020-07-16
Parallel Transitive Closure Algorithm for Heterogeneous Architecture

XIAO Han,GUO Baoyun,LI Cailin,ZHOU Qinglei.Parallel Transitive Closure Algorithm for Heterogeneous Architecture[J].Computer Engineering,2021,47(8):131-139.

Authors:	XIAO Han GUO Baoyun LI Cailin ZHOU Qinglei

Affiliation:	1. School of Information Science and Technology, Zhengzhou Normal University, Zhengzhou 450044, China;2. School of Civil and Architectural Engineering, Shandong University of Technology, Zibo, Shandong 255000, China;3. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China

Abstract:	The traditional method for obtaining the transitive closure of the graphs faces the large amount of calculation and long calculation time. In order to improve the computing speed of the transitive closure algorithm for dealing with large amounts of data, an Open Computing Language(OpenCL)-based parallel algorithm for transitive closure is proposed. The algorithm combines the characteristics of algorithm-intensive computation and OpenCL architecture, and uses the parallel submatrix multiplication and block matrix multiplication optimized by local memory for parallel computing. The parallel submatrix multiplication algorithm is used to optimize the computational steps, improves the memory utilization of the Graphic Processing Unit(GPU), and reduces the data acquisition delay. The parallel block matrix multiplication algorithm is used to implement matrix multiplication involving large amounts of data, and improve the utilization of the GPU computing cores. The experimental results show that compared with the sequential CPU-based algorithm, parallel algorithm based on Open Multi-Processing, and parallel algorithm based on Compute Unified Device Architecture(CUDA), the proposed parallel transitive closure algorithm provides a 593.14 times, 208.62 times and 1.05 times speedup respectively on the NVIDIA GeForce GTX 1070 computing platform with OpenCL architecture.

Keywords:	matrix multiplication transitive closure Graphic Processing Unit(GPU) Open Computing Language (OpenCL) parallel algorithm
本文献已被万方数据等数据库收录！
	点击此处可从《计算机工程》浏览原始摘要信息
	点击此处可从《计算机工程》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏