• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (04): 641-651.

• 高性能计算 • 上一篇    下一篇

基于PCIe的高性能FPGA-GPU-CPU异构编程架构

孙兆鹏,周宽久   

  1. (大连理工大学软件学院,辽宁 大连 116620)
  • 收稿日期:2020-08-05 修回日期:2020-11-10 接受日期:2021-04-25 出版日期:2021-04-25 发布日期:2021-04-21
  • 基金资助:
    中央高校基本科研业务费专项资金(DUT19ZD104)

A high performance FPGA-GPU-CPU heterogeneous programming architecture based on PCIe

SUN Zhao-peng,ZHOU Kuan-jiu   

  1. (College of Software,Dalian University of Technology,Dalian 116620,China)

  • Received:2020-08-05 Revised:2020-11-10 Accepted:2021-04-25 Online:2021-04-25 Published:2021-04-21

摘要: 异构计算作为一种特殊的并行计算方式,能根据计算任务的特点发挥不同计算资源的能力,在提高服务器计算性能、能效比和实时性方面有极大优势,但目前异构计算环境存在编程复杂、可信性无法保证的问题。针对以上问题,提出了一个基于状态变迁矩阵(STM)的编程框架,可以集成GPU和FPGA的资源。通过状态迁移矩阵对CUDA和Vivado的应用程序接口(API)进行集成,自动生成异构计算所需要的标准C代码。通过PCIe总线连接GPU和FPGA设备,从而可以在这些异构计算单元之间进行数据传输,中间无需使用系统CPU内存。并且
通过GPUDirect RDMA实现了FPGA作为主控器的PCIe通信,突破了GPU作为主控器的PCIe通信当中读取操作的短板。
实验表明,相比共享内存的通信方式,
FPGA作为主控器的PCIe通信方式的通信效率提高了1.4倍,
实现的数据速率接近理论带宽的最大值。

关键词: 状态变迁矩阵, 异构计算, FPGA, GPU, PCIe

Abstract: As a special parallel computing method, heterogeneous computing can make full use of the capabilities of different computing units according to the characteristics of computing tasks. It has great advantages in improving the computing performance, real-time performance and reducing the energy consumption of the processor. However, at present, there are some problems in heterogeneous computing environment, such as complex programming and unreliability. To solve these problems, this paper proposes a programming framework based on state transition matrix (STM), which can integrate GPU and FPGA resources. Application programming interfaces (APIs) of CUDA and Vivado are integrated through STM, and the standard C code for heterogeneous computing is automatically generated. By connecting GPU and FPGA devices through PCI Express bus, data can be transferred between these heterogeneous computing units without intermediate use of system CPU memory. Besides, GPUDirect RDMA is used to realize the PCIe communication with FPGA as the main controller, which breaks through the short board of read operation in the PCIe communication with GPU as the main controller. Experimental results show that the communication efficiency is 1.9 times higher than that of shared memory, and the realized data rate is close to the maximum of theoretical bandwidth.


Key words: state transition matrix, heterogeneous computing, FPGA, GPU, PCIe