首页 | 官方网站   微博 | 高级检索  
     

面向异构架构的传递闭包并行算法
引用本文:肖汉,郭宝云,李彩林,周清雷.面向异构架构的传递闭包并行算法[J].计算机工程,2021,47(8):131-139.
作者姓名:肖汉  郭宝云  李彩林  周清雷
作者单位:1. 郑州师范学院 信息科学与技术学院, 郑州 450044;2. 山东理工大学 建筑工程学院, 山东 淄博 255000;3. 郑州大学 信息工程学院, 郑州 450001
基金项目:国家自然科学基金(41601496,41701525,61572444);山东省自然科学基金(ZR2017LD002);山东省重点研发计划项目(2018GGX106002)。
摘    要:传统求图传递闭包的方法存在计算量大与计算时间长的问题。为加快处理大数据量的传递闭包算法的计算速度,结合算法密集计算和开放式计算语言(OpenCL)框架的特征,采用本地存储器优化的并行子矩阵乘和分块的矩阵乘并行计算,提出一种基于OpenCL的传递闭包并行算法。利用本地存储器优化的并行子矩阵乘算法来优化计算步骤,提高图形处理器(GPU)的存储器利用率,降低数据获取延迟。通过分块矩阵乘并行计算算法实现大数据量的矩阵乘,提高GPU计算核心的利用率。数据结果表明,与CPU串行算法、基于开放多处理的并行算法和基于统一设备计算架构的并行算法相比,传递闭包并行算法在OpenCL架构下NVIDIA GeForce GTX 1070计算平台上分别获得了593.14倍、208.62倍和1.05倍的加速比。

关 键 词:矩阵乘  传递闭包  图形处理器  开放式计算语言  并行算法  
收稿时间:2020-04-15
修稿时间:2020-07-16

Parallel Transitive Closure Algorithm for Heterogeneous Architecture
XIAO Han,GUO Baoyun,LI Cailin,ZHOU Qinglei.Parallel Transitive Closure Algorithm for Heterogeneous Architecture[J].Computer Engineering,2021,47(8):131-139.
Authors:XIAO Han  GUO Baoyun  LI Cailin  ZHOU Qinglei
Affiliation:1. School of Information Science and Technology, Zhengzhou Normal University, Zhengzhou 450044, China;2. School of Civil and Architectural Engineering, Shandong University of Technology, Zibo, Shandong 255000, China;3. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China
Abstract:The traditional method for obtaining the transitive closure of the graphs faces the large amount of calculation and long calculation time. In order to improve the computing speed of the transitive closure algorithm for dealing with large amounts of data, an Open Computing Language(OpenCL)-based parallel algorithm for transitive closure is proposed. The algorithm combines the characteristics of algorithm-intensive computation and OpenCL architecture, and uses the parallel submatrix multiplication and block matrix multiplication optimized by local memory for parallel computing. The parallel submatrix multiplication algorithm is used to optimize the computational steps, improves the memory utilization of the Graphic Processing Unit(GPU), and reduces the data acquisition delay. The parallel block matrix multiplication algorithm is used to implement matrix multiplication involving large amounts of data, and improve the utilization of the GPU computing cores. The experimental results show that compared with the sequential CPU-based algorithm, parallel algorithm based on Open Multi-Processing, and parallel algorithm based on Compute Unified Device Architecture(CUDA), the proposed parallel transitive closure algorithm provides a 593.14 times, 208.62 times and 1.05 times speedup respectively on the NVIDIA GeForce GTX 1070 computing platform with OpenCL architecture.
Keywords:matrix multiplication  transitive closure  Graphic Processing Unit(GPU)  Open Computing Language (OpenCL)  parallel algorithm  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号