• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (07): 1141-1148.

• 高性能计算 • 上一篇    下一篇

深度学习加速器在不同剪枝策略下的运行优化

易啸,马胜,肖侬   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2021-12-08 修回日期:2022-02-25 接受日期:2023-07-25 出版日期:2023-07-25 发布日期:2023-07-11
  • 基金资助:
    国家自然科学基金(62172430);湖南省自然科学基金(2021JJ10052);湖南省科技创新计划(2022RC3065)

Running optimization of deep learning accelerators under different pruning strategies

YI Xiao,MA Sheng,XIAO Nong   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2021-12-08 Revised:2022-02-25 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

摘要: 卷积神经网络在图像分析领域已经取得了巨大的成功。随着深度学习的发展,深度学习模型越来越复杂,深度学习的计算量迅速增加。稀疏化算法能在不降低准确率的前提下有效地减少深度学习的计算量。 在ResNet18模型和GoogleNet模型下使用3种不同的剪枝策略减少深度学习模型的计算量。研究表明,在不减少准确率的前提下,全局非结构化剪枝策略使得ResNet18和GoogleNet模型稀疏度分别达到了94%和90%;在基本不减少准确率的情况下,层级非结构化剪枝策略使得ResNet18和GoogleNet模型稀疏度分别达到了83%和56%;在轻微降低准确率的情况下,层级结构化剪枝策略使得ResNet18和GoogleNet模型稀疏度分别达到了34%和22%。3种剪枝策略下,在Eyeriss深度学习加速器结构中运行深度学习剪枝模型。研究结果表明,相较于未剪枝策略,在ResNet模型下,使用全局非结构化剪枝策略的延迟降低了66.0%,功耗降低了60.7%;使用层级非结构化剪枝策略的延迟降低了88.1%,功耗降低了80.6%;使用层级结构化剪枝策略的延迟降低了65.6%,功耗降低了33.5%。相较于未剪枝策略,在GoogleNet模型下,使用全局非结构化剪枝策略的延迟降低了74.5%,功耗降低了63.2%;使用层级非结构化剪枝策略的延迟降低了73.6%,功耗降低了55.0%;使用层级结构化剪枝策略的延迟降低了26.8%,功耗降低了5.8%。 因此,可以得出:在不降低准确率的前提下,使用全局非结构化剪枝策略能大幅度地减少模型运算的延迟和能耗;在轻微降低准确率的前提下,使用层次非结构化剪枝策略能大幅度地降低模型运算的延迟和能耗。

关键词: 深度学习加速器, 卷积神经网络, 剪枝

Abstract: Convolutional neural networks have achieved great success in the field of image analysis. With the development of deep learning,deep learning models are becoming more and more complex,and the amount of deep learning calculations is increasing rapidly. The sparse algorithm can effectively reduce the amount of deep learning calculations without reducing the accuracy. This paper uses three different pruning strategies under the ResNet18 model and GoogleNet model to reduce the calculation amount of the deep learning model. The results show that the global unstructured pruning strategy has a sparsity of 94% and 90% without reducing the accuracy respectively, the level unstructured pruning strategy has an average sparsity of 83% and 56% without basically reducing the accuracy respectively, and the level structured strategy has an average sparsity of 34% and 22% without basically reducing the accuracy respectively. Under the three pruning strategies, the delay and power consumption results obtained by running the sparse deep learning model in the Eyeriss deep learning accelerator shows that, compared with the unpruned strategy, under the ResNet model, the global unstructured pruning strategy has a 66.0% reduction in latency and a 60.7% reduction in power consumption, the level unstructured pruning strategy has a 66.0% reduction in delay and a 80.6% reduction in power consumption, and the level structured pruning strategy has a 65.6% reduction in latency and a 33.5% reduction in power consumption. Under the GoogleNet model, the global unstructured pruning strategy has a 74.5% reduction in latency and a 63.2% reduction in power consumption, the level unstructured pruning strategy has a 73.6% reduction in delay and a 55.0% reduction in power consumption, and the level structured pruning strategy has a 26.8% reduction in latency and a 5.8% reduction in power consumption. Therefore, this paper concludes that the global unstructured pruning strategy can greatly reduce the delay and energy consumption without reducing the accuracy. Under the level unstructured pruning strategy, the delay and energy consumption can be greatly reduced under the premise of slightly reducing the accuracy.

Key words: deep learning accelerator, convolutional neural network, pruning