FDGLib:A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

FDGLib:A Communication Library for Efficient Large-Scale Graph Processing in FPGA-Accelerated Data Centers

Authors:	Yu-Wei Wu Qing-Gang Wang Long Zheng Xiao-Fei Liao Hai Jin Wen-Bin Jiang Ran Zheng Kan Hu

Affiliation:	National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China;Services Computing Technology and System Laboratory, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China;Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

Abstract:	With the rapid growth of real-world graphs,the size of which can easily exceed the on-chip (board) storage capacity of an accelerator,processing large-scale graphs on a single Field Programmable Gate Array (FPGA) becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers (e.g.,Amazon,Microsoft,and Baidu) now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.In this paper,we present a communication library,called FDGLib,which can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides six APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into AccuGraph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed AccuGraph can be 2.32x and 4.77x faster than a state-of-the-art distributed FPGA-based graph accelerator ForeGraph and a distributed CPU-based graph system Gemini,with better scalability.

Keywords:	data center accelerator graph processing distributed architecture communication optimization
本文献已被万方数据等数据库收录！
	点击此处可从《计算机科学技术学报》浏览原始摘要信息
	点击此处可从《计算机科学技术学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏