Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (12): 2139-2149.
Previous Articles Next Articles
CHEN Hao-min1,YAO Sen-jing1,XI Yu1,ZHANG Fan1,XIN Wen-cheng1,WANG Long-hai2,REN Chao2
Received:
Revised:
Accepted:
Online:
Published:
Abstract: YOLOv3-tiny has the excellent target detection capability, but the computational power required by the model is still large, so it is difficult to be used in the embedded application field. This paper proposes a hardware acceleration method of YOLOv3-tiny and implements it on FPGA platform. Firstly, for the fixed-point design of the network, with data accuracy and resource consumption as design indicators, through the statistics of the data distribution in the model and the division of data types, different fixed-point strategies are determined. Secondly, for the parallel design of the network, through the analysis of the calculation characteristics of the convolutional neural network, with the methods of loop adjustment, loop block, loop expansion, and array splitting, a scalable common hardware comput- ing unit is designed. Then, for the network pipeline design, the research is carried out from two aspects: the inter-layer and the intra-layer. Based on the direction of the inter-layer data flow and the division of tasks within the layer, a flexible pipeline computing architecture is designed. Lastly, on the XILINX XC7Z020CLG400-1 platform, experiments demonstrate that, compared with single-core ARM-A9 processor at 667MHz, the proposal achieves the calculation speed as high as 290.56.
Key words: YOLOv3-tiny, convolutional neural network, field programmable gate array, hardware acceleration ,
CHEN Hao-min, YAO Sen-jing, XI Yu, ZHANG Fan, XIN Wen-cheng, WANG Long-hai, REN Chao. Design and FPGA implementation of YOLOv3-tiny hardware acceleration[J]. Computer Engineering & Science, 2021, 43(12): 2139-2149.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2021/V43/I12/2139