深度学习加速器在不同剪枝策略下的运行优化

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (07): 1141-1148.

深度学习加速器在不同剪枝策略下的运行优化

易啸，马胜，肖侬

（国防科技大学计算机学院，湖南长沙 410073）

收稿日期:2021-12-08 修回日期:2022-02-25 接受日期:2023-07-25 出版日期:2023-07-25 发布日期:2023-07-11
基金资助:
国家自然科学基金(62172430)；湖南省自然科学基金(2021JJ10052);湖南省科技创新计划（2022RC3065）

Running optimization of deep learning accelerators under different pruning strategies

YI Xiao，MA Sheng,XIAO Nong

（College of Computer Science and Technology，National University of Defense Technology，Changsha 410073，China）

Received:2021-12-08 Revised:2022-02-25 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

摘要/Abstract

摘要： 卷积神经网络在图像分析领域已经取得了巨大的成功。随着深度学习的发展，深度学习模型越来越复杂，深度学习的计算量迅速增加。稀疏化算法能在不降低准确率的前提下有效地减少深度学习的计算量。在ResNet18模型和GoogleNet模型下使用3种不同的剪枝策略减少深度学习模型的计算量。研究表明，在不减少准确率的前提下，全局非结构化剪枝策略使得ResNet18和GoogleNet模型稀疏度分别达到了94%和90%；在基本不减少准确率的情况下，层级非结构化剪枝策略使得ResNet18和GoogleNet模型稀疏度分别达到了83%和56%；在轻微降低准确率的情况下，层级结构化剪枝策略使得ResNet18和GoogleNet模型稀疏度分别达到了34%和22%。3种剪枝策略下，在Eyeriss深度学习加速器结构中运行深度学习剪枝模型。研究结果表明，相较于未剪枝策略，在ResNet模型下，使用全局非结构化剪枝策略的延迟降低了66.0%，功耗降低了60.7%；使用层级非结构化剪枝策略的延迟降低了88.1%，功耗降低了80.6%；使用层级结构化剪枝策略的延迟降低了65.6%，功耗降低了33.5%。相较于未剪枝策略，在GoogleNet模型下，使用全局非结构化剪枝策略的延迟降低了74.5%，功耗降低了63.2%；使用层级非结构化剪枝策略的延迟降低了73.6%，功耗降低了55.0%；使用层级结构化剪枝策略的延迟降低了26.8%，功耗降低了5.8%。因此，可以得出：在不降低准确率的前提下，使用全局非结构化剪枝策略能大幅度地减少模型运算的延迟和能耗；在轻微降低准确率的前提下，使用层次非结构化剪枝策略能大幅度地降低模型运算的延迟和能耗。

关键词: 深度学习加速器, 卷积神经网络, 剪枝

Abstract: Convolutional neural networks have achieved great success in the field of image analysis. With the development of deep learning，deep learning models are becoming more and more complex，and the amount of deep learning calculations is increasing rapidly. The sparse algorithm can effectively reduce the amount of deep learning calculations without reducing the accuracy. This paper uses three different pruning strategies under the ResNet18 model and GoogleNet model to reduce the calculation amount of the deep learning model. The results show that the global unstructured pruning strategy has a sparsity of 94% and 90% without reducing the accuracy respectively, the level unstructured pruning strategy has an average sparsity of 83% and 56% without basically reducing the accuracy respectively, and the level structured strategy has an average sparsity of 34% and 22% without basically reducing the accuracy respectively. Under the three pruning strategies, the delay and power consumption results obtained by running the sparse deep learning model in the Eyeriss deep learning accelerator shows that, compared with the unpruned strategy, under the ResNet model, the global unstructured pruning strategy has a 66.0% reduction in latency and a 60.7% reduction in power consumption, the level unstructured pruning strategy has a 66.0% reduction in delay and a 80.6% reduction in power consumption, and the level structured pruning strategy has a 65.6% reduction in latency and a 33.5% reduction in power consumption. Under the GoogleNet model, the global unstructured pruning strategy has a 74.5% reduction in latency and a 63.2% reduction in power consumption, the level unstructured pruning strategy has a 73.6% reduction in delay and a 55.0% reduction in power consumption, and the level structured pruning strategy has a 26.8% reduction in latency and a 5.8% reduction in power consumption. Therefore, this paper concludes that the global unstructured pruning strategy can greatly reduce the delay and energy consumption without reducing the accuracy. Under the level unstructured pruning strategy, the delay and energy consumption can be greatly reduced under the premise of slightly reducing the accuracy.

Key words: deep learning accelerator, convolutional neural network, pruning

易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148.

YI Xiao, MA Sheng, XIAO Nong. Running optimization of deep learning accelerators under different pruning strategies[J]. Computer Engineering & Science, 2023, 45(07): 1141-1148.

[1]	徐欣, 李若诗, 袁野, 刘娜. 基于可学习图像滤波器的雾天驾驶场景图像语义分割[J]. 计算机工程与科学, 2024, 46(11): 2027-2034.
[2]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[3]	潘雨青, 于浩, 李峰. 基于加权非负矩阵分解的异常声音检测方法研究[J]. 计算机工程与科学, 2024, 46(08): 1425-1432.
[4]	田红鹏, 吴璟玮. RIB-NER：基于跨度的中文命名实体识别模型[J]. 计算机工程与科学, 2024, 46(07): 1311-1320.
[5]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[6]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[7]	陈杰, 李程, 刘仲. 面向多核向量加速器的卷积神经网络推理和训练向量化方法[J]. 计算机工程与科学, 2024, 46(04): 580-589.
[8]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[9]	秦文强, 吴仲城, 张俊, 李芳, . 基于异构平台的卷积神经网络加速系统设计[J]. 计算机工程与科学, 2024, 46(01): 12-20.
[10]	周理, 赵祉乔, 潘国腾, 铁俊波, 赵王. 基于RISC-V的图卷积神经网络加速器设计[J]. 计算机工程与科学, 2023, 45(12): 2113-2120.
[11]	余子丞, 凌捷. 基于Transformer和多特征融合的DGA域名检测方法[J]. 计算机工程与科学, 2023, 45(08): 1416-1423.
[12]	刘俊奇, 涂文轩, 祝恩. 图卷积神经网络综述[J]. 计算机工程与科学, 2023, 45(08): 1472-1481.
[13]	崔克彬, 崔叶微. 基于卷积和Transformer的断路器动触头跟踪方法研究[J]. 计算机工程与科学, 2023, 45(07): 1236-1244.
[14]	排日旦·阿布都热依木, 吐尔地·托合提, 艾斯卡尔·艾木都拉, . 基于深度学习的实体关系抽取方法研究[J]. 计算机工程与科学, 2023, 45(05): 895-902.
[15]	董芃杉, 张晶, 金日泽. 基于双通道门控复合网络的中文产品评论情感分析[J]. 计算机工程与科学, 2023, 45(05): 911-919.