基于异构平台的卷积神经网络加速系统设计

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (01): 12-20.

基于异构平台的卷积神经网络加速系统设计

秦文强1,吴仲城2,3,张俊2,3,李芳2,3

(1.安徽大学物质科学与信息技术研究院,安徽合肥 230601；
2.中国科学院合肥物质科学研究院强磁场科学中心,安徽合肥 230031；3.强磁场安徽省实验室,安徽合肥 230031)

收稿日期:2022-09-27 修回日期:2023-02-22 接受日期:2024-01-25 出版日期:2024-01-25 发布日期:2024-01-15
基金资助:
中国科学院合肥大科学中心重点研发项目(2019HSC-KPRD003)；合肥综合性国家科学中心项目（QGCYY04）

Design of convolutional neural network acceleration system based on heterogeneous platform

QIN Wen-qiang1,WU Zhong-cheng2,3,ZHANG Jun2,3,LI Fang2,3

(1.Institute of Physical Science and Information Technology,Anhui University,Hefei 230601；
2.Center for High Magnetic Field Science,Hefei Institutes of Physical Science,Chinese Academy of Sciences,Hefei 230031；
3.High Magnetic Field Laboratory of Anhui Province,Hefei 230031,China)

Received:2022-09-27 Revised:2023-02-22 Accepted:2024-01-25 Online:2024-01-25 Published:2024-01-15

摘要/Abstract

摘要： 在计算和存储资源受限的嵌入式设备上部署卷积神经网络，存在执行速度慢、计算效率低、功耗高的问题。提出了一种基于异构平台的新型卷积神经网络加速架构，设计并实现了基于MobileNet的轻量化卷积神经网络加速系统。首先，为降低硬件资源消耗以及数据传输成本，采用动态定点数量化和批标准化融合的设计方法，对网络模型进行了优化，并降低了加速系统的硬件设计复杂度；其次，通过实现卷积分块、并行卷积计算、数据流优化，有效提高了卷积运算效率和系统吞吐率。在PYNQ-Z2平台上的实验结果表明，此加速系统实现的MobileNet网络推理加速方案对单幅图像的识别时间为0.18 s，系统功耗为2.62 W，相较于ARM单核处理器加速效果提升了128倍。

关键词: 现场可编程门阵列(FPGA), Vivado高层次综合, 卷积神经网络, 异构平台, 硬件加速

Abstract: Deploying convolutional neural networks (CNN) on embedded devices with limited computing and storage resources poses challenges such as slow execution speed, low computational efficiency, and high power consumption. This paper proposes a novel CNN acceleration architecture based on a heterogeneous platform, and designs and implements a lightweight CNN acceleration system based on MobileNet. Firstly, to reduce hardware resource consumption and data transmission costs, a design method combining dynamic fixed-point quantization and batch normalization fusion is employed to optimize the network model and reduce the hardware design complexity of the acceleration system. Secondly, by implementing convolutional block partitioning, parallel convolutional computation, and data flow optimization, the efficiency of convolutional operations and system throughput are effectively improved. Experimental results on the PYNQ-Z2 platform demonstrate that the MobileNet network inference acceleration scheme implemented by this acceleration system achieves a recognition time of 0.18 seconds per image and a system power consumption of 2.62 watts, representing a 128-fold improvement in acce- leration performance compared to an ARM single-core processor.

Key words: field programmable gate array (FPGA), Vivado high level synthesis, convolutional neural network, heterogeneous platform, hardware acceleration

秦文强, 吴仲城, 张俊, 李芳, . 基于异构平台的卷积神经网络加速系统设计[J]. 计算机工程与科学, 2024, 46(01): 12-20.

QIN Wen-qiang, WU Zhong-cheng, ZHANG Jun, LI Fang, . Design of convolutional neural network acceleration system based on heterogeneous platform[J]. Computer Engineering & Science, 2024, 46(01): 12-20.

[1]	徐欣, 李若诗, 袁野, 刘娜. 基于可学习图像滤波器的雾天驾驶场景图像语义分割[J]. 计算机工程与科学, 2024, 46(11): 2027-2034.
[2]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[3]	潘雨青, 于浩, 李峰. 基于加权非负矩阵分解的异常声音检测方法研究[J]. 计算机工程与科学, 2024, 46(08): 1425-1432.
[4]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[5]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[6]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[7]	陈杰, 李程, 刘仲. 面向多核向量加速器的卷积神经网络推理和训练向量化方法[J]. 计算机工程与科学, 2024, 46(04): 580-589.
[8]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[9]	周理, 赵祉乔, 潘国腾, 铁俊波, 赵王. 基于RISC-V的图卷积神经网络加速器设计[J]. 计算机工程与科学, 2023, 45(12): 2113-2120.
[10]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(08): 1416-1423.
[11]	刘俊奇, 涂文轩, 祝恩. 图卷积神经网络综述[J]. 计算机工程与科学, 2023, 45(08): 1472-1481.
[12]	易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148.
[13]	崔克彬, 崔叶微. 基于卷积和Transformer的断路器动触头跟踪方法研究[J]. 计算机工程与科学, 2023, 45(07): 1236-1244.
[14]	排日旦·阿布都热依木, 吐尔地·托合提, 艾斯卡尔·艾木都拉, . 基于深度学习的实体关系抽取方法研究[J]. 计算机工程与科学, 2023, 45(05): 895-902.
[15]	董芃杉, 张晶, 金日泽. 基于双通道门控复合网络的中文产品评论情感分析[J]. 计算机工程与科学, 2023, 45(05): 911-919.