一种面向FPGA异构计算的高效能KV加速器 A high performance and energy efficient KV accelerator for FPGA-based heterogeneous computing期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

一种面向FPGA异构计算的高效能KV加速器

引用本文：	孙征征,兰亚柱,付斌章.一种面向FPGA异构计算的高效能KV加速器[J].计算机工程与科学,2016,38(8):1574-1580.

作者姓名：	孙征征兰亚柱付斌章

作者单位：	;1.中国科学院计算技术研究所

基金项目：	国家自然科学基金(61331008，61521092)；中国科学院战略性先导科技专项(XDA06010401)；华为A类高通量服务器项目（YBCB2011030）

摘要：	网络功能虚拟化等新兴应用的蓬勃发展对Key-Value查询的能效提出了更高要求。传统的解决方法要么采用基于软件Hash表,要么采用专用的三态内容可寻址存储器(TCAM)芯片进行加速。其中,软件方法实现成本低,但是在数据冲突较高时会导致查表性能急速下降;硬件TCAM方法具有优良的时间特性,但其价格昂贵、耗能巨大。目前,随着基于现场可编程门阵列FPGA的异构计算技术的高速发展,利用系统已经提供的FPGA资源对基于软件实现的Hash表结构进行加速成为一种性价比更佳的解决方案。探讨如何利用FPGA上的RAM资源来实现一种具有高扩展性和高能效比的TCAM逻辑。与传统的TCAM结构不同,提出的架构支持查表范围的动态缩放,从而可以有效减少查表功耗。为了验证方案的有效性,利用Virtex-7系列FPGA对本文方案进行实现和评估,并与软件查表的性能进行详细比较。实验表明,本文方案吞吐量可达到234 Mpps,查表延迟为25.56ns。相比软件的方法,吞吐量提高780倍,延迟降低240倍。
关键词：	网络功能虚拟化 Key-Value查询三态内容可寻址存储器现场可编程门阵列
收稿时间：	2016-04-03
A high performance and energy efficient KV accelerator for FPGA-based heterogeneous computing

SUN Zheng-zheng,LAN Ya-zhu,FU Bin-zhang.A high performance and energy efficient KV accelerator for FPGA-based heterogeneous computing[J].Computer Engineering & Science,2016,38(8):1574-1580.

Authors:	SUN Zheng-zheng LAN Ya-zhu FU Bin-zhang

Affiliation:	(Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

Abstract:	The flourish of new applications, such as network function virtualization (NFV), has brought higher requirements on high performance and energy efficient Key-Value (KV) lookups. Traditionally, KV operations can be accelerated by software-based HASH tables or dedicated TCAM chips. Software-based solutions are cost efficient but can lead to much worse performance with the increase of data collisions. TCAM-based solutions, on the other hand, have sound performance but suffer high additional system cost and power consumption. Recently, FPGA-based heterogeneous computing becomes more and more popular, so it is quite reasonable to exploit the provided FPGA resources to accelerate the software-based KV operations. To this end, we discuss how to implement high scalable and energy efficient TCAM logics with RAMs on FPGA in this paper. Compared with the traditional TCAM architecture, the proposed FPGA-based TCAM is highly scalable and enables dynamical configurability of the range of lookups so that the power consumption can be reduced significantly. To validate the proposed architecture, we implemented it on Xilinx Virtex-7 FPGA. Experimental results show that the throughput can be as high as 234 Mpps and the latency as low as 25.56ns. Compared with traditional software-based solutions, the throughput is improved by 780 times and the latency is improved by 240 times.

Keywords:	network function virtualization Key-Value lookup TCAM FPGA

	点击此处可从《计算机工程与科学》浏览原始摘要信息
	点击此处可从《计算机工程与科学》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏