• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (03): 389-397.

Previous Articles     Next Articles

A CNN accelerator based on 3D scalable PE array

SU Zi-pei,YANG Xin,CHEN Di-hu,SU Tao   

  1. (School of Electronics and Information Technology,Sun Yat-sen University,Guangzhou 510275,China)

  • Received:2020-04-30 Revised:2020-06-28 Accepted:2021-03-25 Online:2021-03-25 Published:2021-03-26

Abstract: Convolutional neural networks have the characteristics of large parameters and large amount of calculation. When specifically applied to mobile devices, it is necessary to reduce the area of the chip as much as possible under the premise of the frame rate (speed). Considering the compatibility performance, area and other factors of the current mobile terminal network, a CNN accelerator based on a 3D scalable PE array is designed. The accelerator is compatible with 3×3 convolution, 3×3 deep separable convolution, 1×1 convolution, and fully connected layer, and its PE array can set the optimal parallelism parameters in three dimensions according to the network and hardware constraints of the specific application to achieve more excellent performance. The proposed CNN accelerator runs yolo-v2 on 512 PEs to achieve 76.52 GOPS (74.72% performance efficiency), and runs mobile-net-v1 on 512 PEs to achieve 78.05 GOPS (76.22% performance efficiency). The CNN accelerator is used to build up a real-time target detection system on ZC706 FPGA board. Running yolo-lite on the board shows that the CNN performance can achieve a frame rate of 53.65 fps.

Key words: CNN accelerator, 3D PE array, target detection, SoC