卷积神经网络硬件加速的通用性设计

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (04): 577-581.

卷积神经网络硬件加速的通用性设计

王玉雷，谢凯亮，陈思贇，胡杰，常胜

（武汉大学物理科学与技术学院，湖北武汉 430072）

收稿日期:2022-01-19 修回日期:2022-05-09 接受日期:2023-04-25 出版日期:2023-04-25 发布日期:2023-04-13

A universal design on hardware acceleration of convolutional neural networks

WANG Yu-lei，XIE Kai-liang，CHEN Si-yun，HU Jie，CHANG Sheng

(School of Physics and Technology,Wuhan University,Wuhan 430072,China)

Received:2022-01-19 Revised:2022-05-09 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

摘要/Abstract

摘要： 随着人工智能的兴起，应用于各种场景的神经网络算法蓬勃发展。这使得以卷积神经网络为代表的各类算法的通用边缘部署加速设计成为了一大难题。对此，提出了基于数据相关性原理和Roofline模型的一般性和通用性设计准则，并据此对神经网络进行面向硬件加速的并行化设计。对卷积层、池化层和全连接层这3个最重要的部分进行了优化，基于优化后的模块可根据应用场景需求搭建各种卷积神经网络，从而实现通用性设计。以LeNet-5网络为对象，在XILINX ZC702和XILINX ZC706 FPGA平台上分别以MNIST测试集为基准验证，对各层优化后基于高层次综合构建的交互式识别系统，在XILINX ZC702平台上达到了95.09%的准确率和每幅图像4.1 ms的推理速度，在XILINX ZC706平台上达到了相同的准确率和每幅图像0.997 ms的推理速度，二者都具备了很高的处理速度。

关键词: 神经网络, 硬件加速, 通用性设计, FPGA, 高层次综合, Roofline, 数据相关性

Abstract: With the rise of artificial Intelligence, neural network algorithms used in various scenarios are developing vigorously and ever-changing. This makes the general edge deployment acceleration design of various algorithms represented by convolutional neural networks a big problem. In view of this situation, based on the principle of data correlation and Roofline model, a general and universal design rule is proposed to design hardware-paralleled convolutional neural network. The three most important parts such as the convolution layer, the pooling layer and the full connection layer are optimized. Based on the optimized modules, various convolutional neural networks can be built according to the requirements of application scenarios, so as to achieve universal design. With LeNet-5 network as the verification object and MNIST test set as the benchmark, the verification was carried out on XILINX ZC702 and XILINX ZC706 FPGA platforms. The interactive recognition system constructed based on high-level synthesis after optimization of each layer achieves 95.09% accuracy and 4.1 ms/ sheet reasoning speed on XILINX ZC702 platform, and the same accuracy and 0.997 ms/sheet reasoning speed on XILINX ZC706 platform. Both have very high processing speed.

Key words: neural network, hardware acceleration, universal design, FPGA, high-level synthesis, Roofline, data correlation

王玉雷, 谢凯亮, 陈思贇, 胡杰, 常胜. 卷积神经网络硬件加速的通用性设计[J]. 计算机工程与科学, 2023, 45(04): 577-581.

WANG Yu-lei, XIE Kai-liang, CHEN Si-yun, HU Jie, CHANG Sheng. A universal design on hardware acceleration of convolutional neural networks[J]. Computer Engineering & Science, 2023, 45(04): 577-581.

编辑推荐

Metrics

阅读次数

全文

478

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	478

来源	本网站	其他网站

次数	424	54
比例	89%	11%

摘要

274

最新录用	在线预览	正式出版

0	0	274

	来源	本网站

	次数	274
	比例	100%

[1]	沈洁, 龙标, 黄春, 唐滔, 彭林. 面向向量部件的指数和对数函数优化方法[J]. 计算机工程与科学, 2025, 47(01): 18-26.
[2]	沈凡凡, 汤星译, 张军, 徐超, 陈勇, 何炎祥. 基于改进萤火虫算法和长短期记忆网络的恶意行为检测方法[J]. 计算机工程与科学, 2024, 46(12): 2158-2170.
[3]	毛润泽, 吴子恒, 徐嘉阳, 章严, 陈帜, . DeepFlame：基于深度学习和高性能计算的反应流模拟开源平台[J]. 计算机工程与科学, 2024, 46(11): 1901-1907.
[4]	徐欣, 李若诗, 袁野, 刘娜. 基于可学习图像滤波器的雾天驾驶场景图像语义分割[J]. 计算机工程与科学, 2024, 46(11): 2027-2034.
[5]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[6]	陈子雄, 陈旭, 景永俊, 宋吉飞. 基于图神经网络的源代码漏洞检测研究综述[J]. 计算机工程与科学, 2024, 46(10): 1775-1792.
[7]	陈昌奉, 赵宏州, 周恺卿. 基于图神经网络的代码抄袭检测方法[J]. 计算机工程与科学, 2024, 46(10): 1815-1824.
[8]	张悦, 张磊, 刘佰龙, 梁志贞, 张雪飞. 基于时空Transformer的多空间尺度交通预测模型[J]. 计算机工程与科学, 2024, 46(10): 1852-1863.
[9]	王鹏, 张嘉诚, 范毓洋, . 适应于硬件部署的神经网络剪枝量化算法[J]. 计算机工程与科学, 2024, 46(09): 1547-1553.
[10]	袁佳伟, 赵进. 基于图神经网络的OMCI模型相似性计算[J]. 计算机工程与科学, 2024, 46(09): 1576-1586.
[11]	周祺, 周宁宁. 神经网络增强的成对双线性因子分解机[J]. 计算机工程与科学, 2024, 46(09): 1648-1659.
[12]	吴斯琦, 赵清华, 于雨晨. 基于元学习的图神经网络冷启动推荐[J]. 计算机工程与科学, 2024, 46(09): 1675-1684.
[13]	李猛, 刘姿邑, 宋宇航. 基于双重自表达与最大熵原理的深度子空间聚类算法[J]. 计算机工程与科学, 2024, 46(09): 1685-1692.
[14]	黄至锐, 贾心茹, 朱浩哲, 陈迟晓, . 基于SRAM缓存和存内计算的低功耗关键词唤醒系统[J]. 计算机工程与科学, 2024, 46(08): 1331-1339.
[15]	辛高枫, 刘玉潇, 张青龙, 韩锐, 刘驰. 边缘侧神经网络块粒度领域自适应技术研究[J]. 计算机工程与科学, 2024, 46(08): 1361-1371.