Design and FPGA implementation of lightweight convolutional neural network hardware acceleration

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (4): 582-591.

• High Performance Computing • Previous Articles Next Articles

Design and FPGA implementation of lightweight convolutional neural network hardware acceleration

LI Zhenqi,WANG Qiang,QI Xingyun,LAI Mingche,ZHAO Yankang,LU Yihang,LI Yuan

(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

Received:2023-08-18 Revised:2024-07-09 Online:2025-04-25 Published:2025-04-17

Abstract

Abstract: In recent years, convolutional neural networks (CNNs) have achieved remarkable results in fields such as computer vision. However, CNNs typically have complex network structures and substantial computational requirements, making it difficult to implement them on portable devices with limited computational resources and power consumption. FPGAs, with their high parallelism, energy efficiency, and reconfigurability, have emerged as one of the most effective computing platforms for accele- rating CNN inference on portable devices. This paper proposes a CNN accelerator that can be configured for different network structures, and optimizes its latency and power consumption through three aspects: data reuse, pipeline optimization based on row buffers, and low-latency convolution techniques based on adder trees. Taking the YOLOv2-tiny lightweight network model as an example, a real-time target detection system was built on the Navigator ZYNQ-7020 development board. The experimental results show that the design meets low hardware and power requirements for portable devices, with 88% resource consumption and 2.959 W power consumption. It achieves a detection speed of 3.91 fps at an image resolution of 416×256.

Key words: convolutional neural network (CNN), FPGA acceleration, accelerator, portable device

LI Zhenqi, WANG Qiang, QI Xingyun, LAI Mingche, ZHAO Yankang, LU Yihang, LI Yuan. Design and FPGA implementation of lightweight convolutional neural network hardware acceleration[J]. Computer Engineering & Science, 2025, 47(4): 582-591.

[1]	ZHENG Daowen1, ZHOU Yikai1, TANG Yibin2, 3, LIU Bosheng1, WU Jigang1. ReHuff:A Huffman coding hardware architecture based on ReRAM [J]. Computer Engineering & Science, 2025, 47(6): 988-997.
[2]	CHEN Jie, LI Cheng, LIU Zhong. Convolutional neural network inference and training vectorization method for multicore vector accelerators [J]. Computer Engineering & Science, 2024, 46(04): 580-589.
[3]	ZHOU Li, ZHAO Zhi-qiao, PAN Guo-teng, TIE Jun-bo, ZHAO Wang. RISC-V based design of graph convolutional neural network accelerator [J]. Computer Engineering & Science, 2023, 45(12): 2113-2120.
[4]	YI Xiao, MA Sheng, XIAO Nong. Running optimization of deep learning accelerators under different pruning strategies [J]. Computer Engineering & Science, 2023, 45(07): 1141-1148.
[5]	LIU Xiao-hang, JIANG Jing-fei, XU Jin-wei. A fused-layer attention model accelerator based on systolic array [J]. Computer Engineering & Science, 2023, 45(05): 802-809.
[6]	XIAO Jia-le, LIANG Dong-bao, CHEN Di-hu, SU Tao. An efficient and scalable MobileNet accelerator based on FPGA [J]. Computer Engineering & Science, 2021, 43(04): 628-633.
[7]	SU Zi-pei, YANG Xin, CHEN Di-hu, SU Tao. A CNN accelerator based on 3D scalable PE array [J]. Computer Engineering & Science, 2021, 43(03): 389-397.
[8]	CHEN Yong-hao, XIAO Jia-le, SU Tao. Comparison of design schemes of a MobileNetV2 neural network processor [J]. Computer Engineering & Science, 2021, 43(01): 24-32.
[9]	HAN Zhe, JIANG Jingfei, QIAO Linbo, DOU Yong, XU Jinwei, KAN Zhigang. Design and implementation of event extraction model and accelerator based on FPGA [J]. Computer Engineering & Science, 2020, 42(11): 1941-1948.
[10]	LI Biao, LIU Jie, . Heterogeneous cooperative computing of particle transport based on Monte Carlo method on the Tianhe 2A system [J]. Computer Engineering & Science, 2020, 42(11): 1922-1928.
[11]	WANG Ji-jun，HAO Zi-yu，LI Hong-liang. 3D-MMA:Matrix multiplication accelerator architecture based on 3D integrated circuits [J]. Computer Engineering & Science, 2019, 41(12): 2110-2118.
[12]	XU Rui，MA Sheng，GUO Yang，HUANG You，LI Yi-huang. A convolutional neural network accelerator based on Winograd-Sparse algorithm [J]. Computer Engineering & Science, 2019, 41(09): 1557-1566.
[13]	GAN Xin-biao1,2，SUN Liao-yuan3,LIU Jie1，XIONG Cheng-wei1,HUANG Jia-kun1. Orchestrating HPL between CPU and China accelerator [J]. Computer Engineering & Science, 2018, 40(01): 10-14.
[14]	LEI Li1,QIAN Bin-hai1,GUO Jun1,GU Xiong-li2,LIU Peng1. Integrated I/O hardware compression accelerators of Hadoop system architecture [J]. Computer Engineering & Science, 2016, 38(08): 1524-1529.
[15]	HUANG Junjie,HU Xiao. A Matching Algorithm of High Resolution Images Based on Hardware Accelerators and Its Application [J]. J4, 2012, 34(4): 71-76.

Design and FPGA implementation of lightweight convolutional neural network hardware acceleration

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments