• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (09): 1547-1553.

• 高性能计算 • 上一篇    下一篇

适应于硬件部署的神经网络剪枝量化算法

王鹏1,2,张嘉诚1,范毓洋1,2   

  1. (1.中国民航大学民航航空器适航审定技术重点实验室,天津 300399;
    2.中国民航大学安全科学与工程学院,天津 300399)

  • 收稿日期:2022-07-18 修回日期:2023-05-22 接受日期:2024-09-25 出版日期:2024-09-25 发布日期:2024-09-19
  • 基金资助:
    国家重点研发计划(2021YFB1600600)

A neural network pruning and quantization algorithm for hardware deployment

WANG Peng1,2,ZHANG Jia-cheng1,FAN Yu-yang1,2   

  1. (1.Key Laboratory of Civil Aircraft Airworthiness Technology,Civil Aviation University of China,Tianjin 300399;
    2.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300399,China)
  • Received:2022-07-18 Revised:2023-05-22 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

摘要: 深度神经网络由于性能优异已经在图像识别、目标检测等领域广泛应用,然而其包含大量参数和巨大计算量,导致在需要低延时和低功耗的移动边缘端部署时困难。针对该问题,提出一种用移位加法代替乘法运算的压缩算法,通过对神经网络进行剪枝和量化将参数压缩至低比特。该算法在乘法资源有限的情况下降低了硬件部署难度,可满足移动边缘端低延时和低功耗的要求,提高运行效率。对ImageNet数据集经典神经网络进行了实验,结果表明神经网络的参数在压缩到4 bit的情况下,其准确率与全精度神经网络的基本一致,甚至在ResNet18、ResNet50和GoogleNet网络上的Top-1/Top-5准确率还分别提升了0.38%/0.22%,0.35%/0.21%和1.14%/0.57%。对VGG16第8层卷积层进行实验,将其部署在Zynq7035上,结果表明,压缩后的网络在使用的DSP资源减少43%的情况下缩短了51.1%的推理时间,并且减少了46.7%的功耗。

关键词: 深度神经网络, 硬件, 剪枝, 量化, FPGA

Abstract: Due to their superior performance, deep neural networks have been widely applied in fields such as image recognition and object detection. However, they contain a large number of parameters and require immense computational power, posing challenges for deployment on mobile edge devices that require low latency and low power consumption. To address this issue, a compression algorithm that replaces multiplication operations with bit-shifting and addition is proposed. This algorithm compresses neural network parameters to low bit-widths through pruning and quantization. This algorithm reduces the hardware deployment difficulty under limited multiplication resources, meets the requirements of low latency and low power consumption on mobile edge devices, and improves operational efficiency. Experiments conducted on classical neural networks with the ImageNet dataset revealed that when the neural network parameters were compressed to 4 bits, the accuracy remained essentially unchanged compared to the full-precision neural network. Furthermore, for ResNet18, ResNet50, and GoogleNet, the Top-1/Top-5 accuracies even improved by 0.38%/0.22%, 0.35%/0.21%, and 1.14%/0.57%, respectively. When testing the eighth convolutional layer of VGG16 deployed on Zynq7035, the results showed that the compressed network reduced the inference time by 51.1% and power consumption by 46.7%, while using 43% fewer DSP resources.

Key words: deep neural networks, hardware, pruning, quantization, FPGA