A neural network pruning and quantization algorithm for hardware deployment

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (09): 1547-1553.

• High Performance Computing • Previous Articles Next Articles

A neural network pruning and quantization algorithm for hardware deployment

WANG Peng1,2,ZHANG Jia-cheng1,FAN Yu-yang1,2

(1.Key Laboratory of Civil Aircraft Airworthiness Technology,Civil Aviation University of China,Tianjin 300399；
2.College of Safety Science and Engineering,Civil Aviation University of China,Tianjin 300399，China)

Received:2022-07-18 Revised:2023-05-22 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

Abstract

Abstract: Due to their superior performance, deep neural networks have been widely applied in fields such as image recognition and object detection. However, they contain a large number of parameters and require immense computational power, posing challenges for deployment on mobile edge devices that require low latency and low power consumption. To address this issue, a compression algorithm that replaces multiplication operations with bit-shifting and addition is proposed. This algorithm compresses neural network parameters to low bit-widths through pruning and quantization. This algorithm reduces the hardware deployment difficulty under limited multiplication resources, meets the requirements of low latency and low power consumption on mobile edge devices, and improves operational efficiency. Experiments conducted on classical neural networks with the ImageNet dataset revealed that when the neural network parameters were compressed to 4 bits, the accuracy remained essentially unchanged compared to the full-precision neural network. Furthermore, for ResNet18, ResNet50, and GoogleNet, the Top-1/Top-5 accuracies even improved by 0.38%/0.22%, 0.35%/0.21%, and 1.14%/0.57%, respectively. When testing the eighth convolutional layer of VGG16 deployed on Zynq7035, the results showed that the compressed network reduced the inference time by 51.1% and power consumption by 46.7%, while using 43% fewer DSP resources.

Key words: deep neural networks, hardware, pruning, quantization, FPGA

WANG Peng, ZHANG Jia-cheng, FAN Yu-yang, . A neural network pruning and quantization algorithm for hardware deployment[J]. Computer Engineering & Science, 2024, 46(09): 1547-1553.

[1]	SHEN Jie, LONG Biao, HUANG Chun, TANG Tao, PENG Lin. Optimization of exponential and logarithm functions for vector units [J]. Computer Engineering & Science, 2025, 47(01): 18-26.
[2]	WANG Hui, LI Yan, DING Ding, WU Kun, HUANG Ya-ping, . A high utility quantitative frequent pattern mining algorithm based on related degree [J]. Computer Engineering & Science, 2024, 46(09): 1702-1710.
[3]	HUANG Zhi-rui, JIA Xin-ru, , ZHU Hao-zhe, , CHEN Chi-xiao, . A low-power keyword spotting system with SRAM buffer and computing-in-memory [J]. Computer Engineering & Science, 2024, 46(08): 1331-1339.
[4]	MA Ke-fan, LI Bao-feng, ZHOU Yue-jin, WU Yuan-yuan, YU Yong-lan, DUO Rui-hua. Design and implementation of a baseboard management controller on ZYNQ chip [J]. Computer Engineering & Science, 2024, 46(02): 217-223.
[5]	ZHAO Zhi-qiao, ZHOU Li, XUN Chang-qing, PAN Guo-teng, TIE Jun-bo, WANG Wei-zheng. Efficient analysis of coherent hub interface protocol mixturing hardware and software [J]. Computer Engineering & Science, 2024, 46(02): 224-231.
[6]	QIN Wen-qiang, WU Zhong-cheng, ZHANG Jun, LI Fang, . Design of convolutional neural network acceleration system based on heterogeneous platform [J]. Computer Engineering & Science, 2024, 46(01): 12-20.
[7]	ZHOU Li, ZHAO Zhi-qiao, PAN Guo-teng, TIE Jun-bo, ZHAO Wang. RISC-V based design of graph convolutional neural network accelerator [J]. Computer Engineering & Science, 2023, 45(12): 2113-2120.
[8]	DENG Xi, FAN Guang-sheng, CHEN Li-qian, LI Tun, WANG Ji. A Verilog code verification method based on C program analysis and verification techniques [J]. Computer Engineering & Science, 2023, 45(12): 2146-2154.
[9]	YI Xiao, MA Sheng, XIAO Nong. Running optimization of deep learning accelerators under different pruning strategies [J]. Computer Engineering & Science, 2023, 45(07): 1141-1148.
[10]	WANG Yu-lei, XIE Kai-liang, CHEN Si-yun, HU Jie, CHANG Sheng. A universal design on hardware acceleration of convolutional neural networks [J]. Computer Engineering & Science, 2023, 45(04): 577-581.
[11]	MA Dong-mei, HUANG Xin-yue, LI Yu. Image semantic segmentation based on feature fusion and attention mechanism [J]. Computer Engineering & Science, 2023, 45(03): 495-503.
[12]	ZANG Zhao-hu, LI Chen, WANG Yao-hua, CHEN Xiao-wen, GUO Yang . A hierarchical hardware barrier synchronization design for many-core processors [J]. Computer Engineering & Science, 2022, 44(11): 1901-1908.
[13]	LU Song, JIANG Ju-ping, REN Hui-feng. Quick customization for RISC-V processor based on FPGA [J]. Computer Engineering & Science, 2022, 44(10): 1747-1752.
[14]	ZHENG Si-fei, FENG Zi-jing, LIU Cheng-yu, CHEN Ri-qing, LIU Xiao-long. An image camouflage encryption method based on vector quantization in cloud storage environment [J]. Computer Engineering & Science, 2022, 44(06): 1030-1036.
[15]	LIU Zi-yan, YUAN Lei, ZHU Ming-cheng, MA Shan-shan. A masked face detection algorithm fusing improved channel and layer pruning [J]. Computer Engineering & Science, 2022, 44(03): 463-470.

A neural network pruning and quantization algorithm for hardware deployment

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments