FDGLib:A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers |
| |
Authors: | Yu-Wei Wu Qing-Gang Wang Long Zheng Xiao-Fei Liao Hai Jin Wen-Bin Jiang Ran Zheng Kan Hu |
| |
Affiliation: | National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China;Services Computing Technology and System Laboratory, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China;Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China |
| |
Abstract: | With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip (board) storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers (e.g.,Amazon,Microsoft,and Baidu) now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability. |
| |
Keywords: | data center accelerator graph processing distributed architecture communication optimization |
本文献已被 万方数据 等数据库收录! |
| 点击此处可从《计算机科学技术学报》浏览原始摘要信息 |
|
点击此处可从《计算机科学技术学报》下载全文 |
|