• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (01): 12-20.

• High Performance Computing • Previous Articles     Next Articles

Design of convolutional neural network acceleration system based on heterogeneous platform

QIN Wen-qiang1,WU Zhong-cheng2,3,ZHANG Jun2,3,LI Fang2,3   

  1. (1.Institute of Physical Science and Information Technology,Anhui University,Hefei 230601;
    2.Center for High Magnetic Field Science,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031;
    3.High Magnetic Field Laboratory of Anhui Province,Hefei 230031,China)
  • Received:2022-09-27 Revised:2023-02-22 Accepted:2024-01-25 Online:2024-01-25 Published:2024-01-15

Abstract: Deploying convolutional neural networks (CNN) on embedded devices with limited computing and storage resources poses challenges such as slow execution speed, low computational efficiency, and high power consumption. This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform, and designs and implements a lightweight CNN acceleration system based on MobileNet. Firstly, to reduce hardware resource consumption and data transmission costs, a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system. Secondly, by implementing convolutional block partitioning, parallel convolutional computation, and data flow optimization, the efficiency of convolutional operations and system throughput are effectively improved. Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts, representing a 128-fold improvement in acce- leration performance compared to an ARM single-core processor.


Key words: field programmable gate array (FPGA), Vivado high level synthesis, convolutional neural network, heterogeneous platform, hardware acceleration