适应于硬件部署的神经网络剪枝量化算法

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (09): 1547-1553.

适应于硬件部署的神经网络剪枝量化算法

王鹏1,2,张嘉诚1,范毓洋1,2

(1.中国民航大学民航航空器适航审定技术重点实验室，天津 300399；
2.中国民航大学安全科学与工程学院，天津 300399)

收稿日期:2022-07-18 修回日期:2023-05-22 接受日期:2024-09-25 出版日期:2024-09-25 发布日期:2024-09-19
基金资助:
国家重点研发计划(2021YFB1600600)

A neural network pruning and quantization algorithm for hardware deployment

WANG Peng1,2,ZHANG Jia-cheng1,FAN Yu-yang1,2

(1.Key Laboratory of Civil Aircraft Airworthiness Technology,Civil Aviation University of China,Tianjin 300399；
2.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300399，China)

Received:2022-07-18 Revised:2023-05-22 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

摘要/Abstract

摘要： 深度神经网络由于性能优异已经在图像识别、目标检测等领域广泛应用，然而其包含大量参数和巨大计算量，导致在需要低延时和低功耗的移动边缘端部署时困难。针对该问题，提出一种用移位加法代替乘法运算的压缩算法，通过对神经网络进行剪枝和量化将参数压缩至低比特。该算法在乘法资源有限的情况下降低了硬件部署难度，可满足移动边缘端低延时和低功耗的要求，提高运行效率。对ImageNet数据集经典神经网络进行了实验，结果表明神经网络的参数在压缩到4 bit的情况下，其准确率与全精度神经网络的基本一致，甚至在ResNet18、ResNet50和GoogleNet网络上的Top-1/Top-5准确率还分别提升了0.38%/0.22%,0.35%/0.21%和1.14%/0.57%。对VGG16第8层卷积层进行实验，将其部署在Zynq7035上，结果表明,压缩后的网络在使用的DSP资源减少43%的情况下缩短了51.1%的推理时间，并且减少了46.7%的功耗。

关键词: 深度神经网络, 硬件, 剪枝, 量化, FPGA

Abstract: Due to their superior performance, deep neural networks have been widely applied in fields such as image recognition and object detection. However, they contain a large number of parameters and require immense computational power, posing challenges for deployment on mobile edge devices that require low latency and low power consumption. To address this issue, a compression algorithm that replaces multiplication operations with bit-shifting and addition is proposed. This algorithm compresses neural network parameters to low bit-widths through pruning and quantization. This algorithm reduces the hardware deployment difficulty under limited multiplication resources, meets the requirements of low latency and low power consumption on mobile edge devices, and improves operational efficiency. Experiments conducted on classical neural networks with the ImageNet dataset revealed that when the neural network parameters were compressed to 4 bits, the accuracy remained essentially unchanged compared to the full-precision neural network. Furthermore, for ResNet18, ResNet50, and GoogleNet, the Top-1/Top-5 accuracies even improved by 0.38%/0.22%, 0.35%/0.21%, and 1.14%/0.57%, respectively. When testing the eighth convolutional layer of VGG16 deployed on Zynq7035, the results showed that the compressed network reduced the inference time by 51.1% and power consumption by 46.7%, while using 43% fewer DSP resources.

Key words: deep neural networks, hardware, pruning, quantization, FPGA

王鹏, 张嘉诚, 范毓洋, . 适应于硬件部署的神经网络剪枝量化算法[J]. 计算机工程与科学, 2024, 46(09): 1547-1553.

WANG Peng, ZHANG Jia-cheng, FAN Yu-yang, . A neural network pruning and quantization algorithm for hardware deployment[J]. Computer Engineering & Science, 2024, 46(09): 1547-1553.

编辑推荐

Metrics

阅读次数

全文

373

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	373

来源	本网站	其他网站

次数	303	70
比例	81%	19%

摘要

156

最新录用	在线预览	正式出版

0	0	156

	来源	本网站

	次数	156
	比例	100%

[1]	沈洁, 龙标, 黄春, 唐滔, 彭林. 面向向量部件的指数和对数函数优化方法[J]. 计算机工程与科学, 2025, 47(01): 18-26.
[2]	秦莹, 阳娅婧, 马俊, 万家齐. 基于关系图的Linux内核兼容性量化分析研究[J]. 计算机工程与科学, 2024, 46(10): 1720-1734.
[3]	傅游, 韩昊, 孙月娇, 梁建国, 叶雨曦, 花嵘. 基于OpenMP的硅晶体分子动力学模拟的空间分解着色及向量化研究#br#[J]. 计算机工程与科学, 2024, 46(09): 1566-1575.
[4]	王辉, 李燕, 丁丁, 吴坤, 黄雅平, . 一种基于关联程度的高效用数量比频繁模式挖掘算法[J]. 计算机工程与科学, 2024, 46(09): 1702-1710.
[5]	黄至锐, 贾心茹, 朱浩哲, 陈迟晓, . 基于SRAM缓存和存内计算的低功耗关键词唤醒系统[J]. 计算机工程与科学, 2024, 46(08): 1331-1339.
[6]	辛高枫, 刘玉潇, 张青龙, 韩锐, 刘驰. 边缘侧神经网络块粒度领域自适应技术研究[J]. 计算机工程与科学, 2024, 46(08): 1361-1371.
[7]	刘子洋, 徐慧英, 朱信忠, 李琛, 王泽宇, 曹雨淇, 戴康佳. Bi-YOLO：一种基于YOLOv8n改进的轻量化目标检测算法[J]. 计算机工程与科学, 2024, 46(08): 1444-1454.
[8]	姜晶菲, 何源宏, 许金伟, 许诗瑶, 钱希福. NM-SpMM：面向国产异构向量处理器的半结构化稀疏矩阵乘算法[J]. 计算机工程与科学, 2024, 46(07): 1141-1150.
[9]	王泽宇, 徐慧英, 朱信忠, 李琛, 刘子洋, 王子奕. 基于YOLOv8改进的密集行人检测算法：MER-YOLO[J]. 计算机工程与科学, 2024, 46(06): 1050-1062.
[10]	邓翔宇, 裴浩媛, 盛迎. 基于网络融合的改进MobileViT人脸表情识别[J]. 计算机工程与科学, 2024, 46(06): 1072-1080.
[11]	马柯帆, 李宝峰, 周悦锦, 武园园, 余永兰, 多瑞华. 基于ZYNQ 芯片的基板管理控制器设计与实现[J]. 计算机工程与科学, 2024, 46(02): 217-223.
[12]	赵祉乔, 周理, 荀长庆, 潘国腾, 铁俊波, 王伟征 . 软硬件混合的高效CHI协议分析[J]. 计算机工程与科学, 2024, 46(02): 224-231.
[13]	黄珍伟, 陈伟, 王文杰, 路锦通. 基于改进 RetinaNet网络的水下机器人目标检测与实验[J]. 计算机工程与科学, 2024, 46(02): 264-271.
[14]	秦文强, 吴仲城, 张俊, 李芳, . 基于异构平台的卷积神经网络加速系统设计[J]. 计算机工程与科学, 2024, 46(01): 12-20.
[15]	周理, 赵祉乔, 潘国腾, 铁俊波, 赵王. 基于RISC-V的图卷积神经网络加速器设计[J]. 计算机工程与科学, 2023, 45(12): 2113-2120.