• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (4): 582-591.

• 高性能计算 • 上一篇    下一篇

轻量化卷积神经网络硬件加速设计及FPGA实现

李珍琪,王强,齐星云,赖明澈,赵言亢,陆亿行,黎渊   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2023-08-18 修回日期:2024-07-09 出版日期:2025-04-25 发布日期:2025-04-17

Design and FPGA implementation of lightweight convolutional neural network hardware acceleration

LI Zhenqi,WANG Qiang,QI Xingyun,LAI Mingche,ZHAO Yankang,LU Yihang,LI Yuan   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-08-18 Revised:2024-07-09 Online:2025-04-25 Published:2025-04-17

摘要: 近年来,卷积神经网络CNN在计算机视觉等领域取得了显著的成效。然而,通常CNN的网络结构复杂,计算量庞大,难以在计算资源和功耗受限的便携式设备上实现。而FPGA具有较高的并行度、能效比和可重构性,已成为在便携式设备上加速CNN推理最有效的计算平台之一。设计了一种可配置为不同网络结构的卷积神经网络加速器,并从数据复用、基于行缓存的流水线优化和基于加法树的低延迟卷积技术3个方面对加速器的延迟和功耗进行了优化。以轻量化神经网络YOLOv2-tiny为例,在领航者ZYNQ-7020开发板上构建了一个实时目标检测系统。实验结果表明,整个设计的资源消耗占用为88%,功耗消耗为2.959 W,满足便携设备低硬件消耗及低功耗设计要求,在416×256的图像分辨率下,实现了3.91 fps的检测速度。

关键词: 卷积神经网络, FPGA加速, 加速器, 便携设备

Abstract: In recent years, convolutional neural networks (CNNs) have achieved remarkable results in fields such as computer vision. However, CNNs typically have complex network structures and substantial computational requirements, making it difficult to implement them on portable devices with limited computational resources and power consumption. FPGAs, with their high parallelism, energy efficiency, and reconfigurability, have emerged as one of the most effective computing platforms for accele- rating CNN inference on portable devices. This paper proposes a CNN accelerator that can be configured for different network structures, and optimizes its latency and power consumption through three aspects: data reuse, pipeline optimization based on row buffers, and low-latency convolution techniques based on adder trees. Taking the YOLOv2-tiny lightweight network model as an example, a real-time target detection system was built on the Navigator ZYNQ-7020 development board. The experimental results show that the design meets low hardware and power requirements for portable devices, with 88% resource consumption and 2.959 W power consumption. It achieves a detection speed of 3.91 fps at an image resolution of  416×256.

Key words: convolutional neural network (CNN), FPGA acceleration, accelerator, portable device