YOLOv3-tiny的硬件加速设计及FPGA实现

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (12): 2139-2149.

YOLOv3-tiny的硬件加速设计及FPGA实现

陈浩敏1，姚森敬1，席禹1，张凡1，辛文成1，王龙海2，任超2

（1.南方电网数字电网研究院有限公司，广东广州 510623；2.天津大学电气自动化与信息工程学院，天津 300072）

收稿日期:2020-11-10 修回日期:2021-01-04 接受日期:2021-12-25 出版日期:2021-12-25 发布日期:2021-12-31
基金资助:
国家自然科学基金（61972282）

Design and FPGA implementation of YOLOv3-tiny hardware acceleration

CHEN Hao-min1,YAO Sen-jing1,XI Yu1,ZHANG Fan1,XIN Wen-cheng1,WANG Long-hai2,REN Chao2

(1.China Southern Power Grid Digital Grid Research Institute Limited Company,Guangzhou 510623;

2.School of Electrical Automation and Information Engineering,Tianjin University,Tianjin 300072,China）

Received:2020-11-10 Revised:2021-01-04 Accepted:2021-12-25 Online:2021-12-25 Published:2021-12-31

摘要/Abstract

摘要： YOLOv3-tiny具有优秀的目标检测能力，但模型所需的计算力依然较大，难以实现面向嵌入式领域的应用。提出一种YOLOv3-tiny的硬件加速方法，并在FPGA平台上实现。首先，针对网络定点化设计，以数据精度与资源消耗为设计指标，通过对模型中数据分布的统计以及数据类型的划分，提出了不同的定点化策略。其次，针对网络并行化设计，通过对卷积神经网络计算特性的分析，使用循环调整、循环分块、循环展开和数组分割等方法，设计了可扩展的常用硬件计算单元架构。然后，针对网络流水化设计，从层间与层内2个方面进行研究，以层间数据流方向和层内任务划分为基础，设计了一种灵活的流水化计算架构。最后，在XILINX XC7Z020CLG400-1平台上进行实验，结果表明，相较于667 MHz的单核ARM-A9处理器，加速比高达290.56。

关键词: YOLOV3-tiny, 卷积神经网络, FPGA, 硬件加速

Abstract: YOLOv3-tiny has the excellent target detection capability, but the computational power required by the model is still large, so it is difficult to be used in the embedded application field. This paper proposes a hardware acceleration method of YOLOv3-tiny and implements it on FPGA platform. Firstly, for the fixed-point design of the network, with data accuracy and resource consumption as design indicators, through the statistics of the data distribution in the model and the division of data types, different fixed-point strategies are determined. Secondly, for the parallel design of the network, through the analysis of the calculation characteristics of the convolutional neural network, with the methods of loop adjustment, loop block, loop expansion, and array splitting, a scalable common hardware comput- ing unit is designed. Then, for the network pipeline design, the research is carried out from two aspects: the inter-layer and the intra-layer. Based on the direction of the inter-layer data flow and the division of tasks within the layer, a flexible pipeline computing architecture is designed. Lastly, on the XILINX XC7Z020CLG400-1 platform, experiments demonstrate that, compared with single-core ARM-A9 processor at 667MHz, the proposal achieves the calculation speed as high as 290.56.

Key words: YOLOv3-tiny, convolutional neural network, field programmable gate array, hardware acceleration ,

陈浩敏, 姚森敬, 席禹, 张凡, 辛文成, 王龙海, 任超. YOLOv3-tiny的硬件加速设计及FPGA实现[J]. 计算机工程与科学, 2021, 43(12): 2139-2149.

CHEN Hao-min, YAO Sen-jing, XI Yu, ZHANG Fan, XIN Wen-cheng, WANG Long-hai, REN Chao. Design and FPGA implementation of YOLOv3-tiny hardware acceleration[J]. Computer Engineering & Science, 2021, 43(12): 2139-2149.

[1]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[2]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[3]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[4]	陈杰, 李程, 刘仲. 面向多核向量加速器的卷积神经网络推理和训练向量化方法[J]. 计算机工程与科学, 2024, 46(04): 580-589.
[5]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[6]	马柯帆, 李宝峰, 周悦锦, 武园园, 余永兰, 多瑞华. 基于ZYNQ 芯片的基板管理控制器设计与实现[J]. 计算机工程与科学, 2024, 46(02): 217-223.
[7]	赵祉乔, 周理, 荀长庆, 潘国腾, 铁俊波, 王伟征 . 软硬件混合的高效CHI协议分析[J]. 计算机工程与科学, 2024, 46(02): 224-231.
[8]	秦文强, 吴仲城, 张俊, 李芳, . 基于异构平台的卷积神经网络加速系统设计[J]. 计算机工程与科学, 2024, 46(01): 12-20.
[9]	周理, 赵祉乔, 潘国腾, 铁俊波, 赵王. 基于RISC-V的图卷积神经网络加速器设计[J]. 计算机工程与科学, 2023, 45(12): 2113-2120.
[10]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(08): 1416-1423.
[11]	刘俊奇, 涂文轩, 祝恩. 图卷积神经网络综述[J]. 计算机工程与科学, 2023, 45(08): 1472-1481.
[12]	易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148.
[13]	崔克彬, 崔叶微. 基于卷积和Transformer的断路器动触头跟踪方法研究[J]. 计算机工程与科学, 2023, 45(07): 1236-1244.
[14]	排日旦·阿布都热依木, 吐尔地·托合提, 艾斯卡尔·艾木都拉, . 基于深度学习的实体关系抽取方法研究[J]. 计算机工程与科学, 2023, 45(05): 895-902.
[15]	董芃杉, 张晶, 金日泽. 基于双通道门控复合网络的中文产品评论情感分析[J]. 计算机工程与科学, 2023, 45(05): 911-919.