A universal design on hardware acceleration of convolutional neural networks

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (04): 577-581.

• High Performance Computing • Previous Articles Next Articles

A universal design on hardware acceleration of convolutional neural networks

WANG Yu-lei，XIE Kai-liang，CHEN Si-yun，HU Jie，CHANG Sheng

(School of Physics and Technology,Wuhan University,Wuhan 430072,China)

Received:2022-01-19 Revised:2022-05-09 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

Abstract

Abstract: With the rise of artificial Intelligence, neural network algorithms used in various scenarios are developing vigorously and ever-changing. This makes the general edge deployment acceleration design of various algorithms represented by convolutional neural networks a big problem. In view of this situation, based on the principle of data correlation and Roofline model, a general and universal design rule is proposed to design hardware-paralleled convolutional neural network. The three most important parts such as the convolution layer, the pooling layer and the full connection layer are optimized. Based on the optimized modules, various convolutional neural networks can be built according to the requirements of application scenarios, so as to achieve universal design. With LeNet-5 network as the verification object and MNIST test set as the benchmark, the verification was carried out on XILINX ZC702 and XILINX ZC706 FPGA platforms. The interactive recognition system constructed based on high-level synthesis after optimization of each layer achieves 95.09% accuracy and 4.1 ms/ sheet reasoning speed on XILINX ZC702 platform, and the same accuracy and 0.997 ms/sheet reasoning speed on XILINX ZC706 platform. Both have very high processing speed.

Key words: neural network, hardware acceleration, universal design, FPGA, high-level synthesis, Roofline, data correlation

WANG Yu-lei, XIE Kai-liang, CHEN Si-yun, HU Jie, CHANG Sheng. A universal design on hardware acceleration of convolutional neural networks[J]. Computer Engineering & Science, 2023, 45(04): 577-581.

[1]	SHEN Jie, LONG Biao, HUANG Chun, TANG Tao, PENG Lin. Optimization of exponential and logarithm functions for vector units [J]. Computer Engineering & Science, 2025, 47(01): 18-26.
[2]	WANG Peng, ZHANG Jia-cheng, FAN Yu-yang, . A neural network pruning and quantization algorithm for hardware deployment [J]. Computer Engineering & Science, 2024, 46(09): 1547-1553.
[3]	ZHOU Zhi, GAO Jian-hua, JI Wei-xing. Optimization of sparse matrix-vector multiplication based on FPGA and row folding [J]. Computer Engineering & Science, 2024, 46(08): 1340-1348.
[4]	MA Ke-fan, LI Bao-feng, ZHOU Yue-jin, WU Yuan-yuan, YU Yong-lan, DUO Rui-hua. Design and implementation of a baseboard management controller on ZYNQ chip [J]. Computer Engineering & Science, 2024, 46(02): 217-223.
[5]	ZHAO Zhi-qiao, ZHOU Li, XUN Chang-qing, PAN Guo-teng, TIE Jun-bo, WANG Wei-zheng. Efficient analysis of coherent hub interface protocol mixturing hardware and software [J]. Computer Engineering & Science, 2024, 46(02): 224-231.
[6]	QIN Wen-qiang, WU Zhong-cheng, ZHANG Jun, LI Fang, . Design of convolutional neural network acceleration system based on heterogeneous platform [J]. Computer Engineering & Science, 2024, 46(01): 12-20.
[7]	XIAO Ni-ni, JIN Chang, DUAN Xiang-yu. Unsupervised domain-adapted machine translation based on improving the quality of pseudo-parallel sentence pairs [J]. Computer Engineering & Science, 2022, 44(12): 2230-2237.
[8]	LU Song, JIANG Ju-ping, REN Hui-feng. Quick customization for RISC-V processor based on FPGA [J]. Computer Engineering & Science, 2022, 44(10): 1747-1752.
[9]	WANG Xu, JIA Hao, JI Bai-jun, DUAN Xiang-yu. Neural machine translation based on dictionary model fusion [J]. Computer Engineering & Science, 2022, 44(08): 1481-1487.
[10]	CHEN Xiao-fan, YANG Zhi-jie, PENG Ling-hui, WANG Shi-ying, ZHOU Gan, LI Shi-ming, KANG Zi-yang, WANG Yao, SHI Wei, WANG Lei. A verification framework of network on chip for neuromorphic processors [J]. Computer Engineering & Science, 2022, 44(05): 769-778.
[11]	XUE Qing-tian, LI Jun-hui, GONG Zheng-xian, XU Dong-qin. Unsupervised neural machine translation model based on pre-training [J]. Computer Engineering & Science, 2022, 44(04): 730-736.
[12]	LI Tie-jun, MA Ke-fan, ZHANG Jian-min. A parallel FPGA SAT solver based on incomplete algorithm [J]. Computer Engineering & Science, 2021, 43(12): 2126-2130.
[13]	CHEN Hao-min, YAO Sen-jing, XI Yu, ZHANG Fan, XIN Wen-cheng, WANG Long-hai, REN Chao. Design and FPGA implementation of YOLOv3-tiny hardware acceleration [J]. Computer Engineering & Science, 2021, 43(12): 2139-2149.
[14]	BAI Yu-long, PAN Xing-yu, DUAN Ji-kai, YANG Yang. A four-wing memristive chaotic system based on hyperbolic sine function and its FPGA implementation [J]. Computer Engineering & Science, 2021, 43(10): 1744-1749.
[15]	ZHAO Xiao-qiang, JIANG Jing-fei, XU Jin-wei, DOU Yong. A dynamic remainder processing mapping model for convolutional neural network accelerator on FPGA [J]. Computer Engineering & Science, 2021, 43(09): 1521-1528.

A universal design on hardware acceleration of convolutional neural networks

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles 0

Metrics

Comments