• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

一种面向FPGA异构计算的高效能KV加速器

孙征征,兰亚柱,付斌章   

  1. (中国科学院计算技术研究所,北京 100190)
  • 收稿日期:2016-04-03 修回日期:2016-06-11 出版日期:2016-08-25
  • 基金资助:

    国家自然科学基金(61331008,61521092);中国科学院战略性先导科技专项(XDA06010401);华为A类高通量服务器项目(YBCB2011030)

A high performance and energy efficient KV accelerator for FPGA-based heterogeneous computing  

SUN Zheng-zheng,LAN Ya-zhu,FU Bin-zhang   

  1. (Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2016-04-03 Revised:2016-06-11 Online:2016-08-25

摘要:

网络功能虚拟化等新兴应用的蓬勃发展对Key-Value查询的能效提出了更高要求。传统的解决方法要么采用基于软件Hash表,要么采用专用的三态内容可寻址存储器(TCAM)芯片进行加速。其中,软件方法实现成本低,但是在数据冲突较高时会导致查表性能急速下降;硬件TCAM方法具有优良的时间特性,但其价格昂贵、耗能巨大。目前,随着基于现场可编程门阵列FPGA的异构计算技术的高速发展,利用系统已经提供的FPGA资源对基于软件实现的Hash表结构进行加速成为一种性价比更佳的解决方案。探讨如何利用FPGA上的RAM资源来实现一种具有高扩展性和高能效比的TCAM逻辑。与传统的TCAM结构不同,提出的架构支持查表范围的动态缩放,从而可以有效减少查表功耗。为了验证方案的有效性,利用Virtex-7系列FPGA对本文方案进行实现和评估,并与软件查表的性能进行详细比较。实验表明,本文方案吞吐量可达到234 Mpps,查表延迟为25.56 ns。相比软件的方法,吞吐量提高780倍,延迟降低240倍。

关键词: 网络功能虚拟化, Key-Value查询, 三态内容可寻址存储器, 现场可编程门阵列

Abstract:

The flourish of new applications, such as network function virtualization (NFV), has brought higher requirements on high performance and energy efficient Key-Value (KV) lookups. Traditionally, KV operations can be accelerated by software-based HASH tables or dedicated TCAM chips. Software-based solutions are cost efficient but can lead to much worse performance with the increase of data collisions. TCAM-based solutions, on the other hand, have sound performance but suffer high additional system cost and power consumption. Recently, FPGA-based heterogeneous computing becomes more and more popular, so it is quite reasonable to exploit the provided FPGA resources to accelerate the software-based KV operations. To this end, we discuss how to implement high scalable and energy efficient TCAM logics with RAMs on FPGA in this paper. Compared with the traditional TCAM architecture, the proposed FPGA-based TCAM is highly scalable and enables dynamical configurability of the range of lookups so that the power consumption can be reduced significantly. To validate the proposed architecture, we implemented it on Xilinx Virtex-7 FPGA. Experimental results show that the throughput can be as high as 234 Mpps and the latency as low as 25.56ns. Compared with traditional software-based solutions, the throughput is improved by 780 times  and the latency is improved by 240 times.

Key words: network function virtualization, Key-Value lookup, TCAM, FPGA