面向深度学习的SoC架构设计与仿真

计算机工程与科学

面向深度学习的SoC架构设计与仿真

崔浩然，李涵，冯煜晶，吴萌，王超，陶冠良，张志敏

（中国科学院计算技术研究所，北京 100094）

收稿日期:2018-08-25 修回日期:2018-10-17 出版日期:2019-01-25 发布日期:2019-01-25

Design and simulation of a deep learning SoC architecture

CUI Haoran，LI Han，FENG Yujing，WU Meng，WANG Chao，TAO Guanliang，ZHANG Zhimin

（Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100094,China）

Received:2018-08-25 Revised:2018-10-17 Online:2019-01-25 Published:2019-01-25

摘要/Abstract

摘要：

互联网时代信息量的爆炸式增长、深度学习的普及使传统通用计算无法适应大规模、高并发的计算需求。异构计算能够为深度学习释放更强的计算能力，达到更高的性能要求，并可应用于更广阔的计算场景。针对深度学习算法，设计仿真了一款完整的异构计算SoC架构。首先，通过对常用深度学习算法，如GoogleNet、LSTM、SSD，进行计算特征分析，将其归纳为有限个共性算子类，并用图表及结构框图的形式进行展示，同时生成最小算子级别伪指令流。其次，根据提取的算法特征，进行面向深度学习的硬件加速AI IP核设计，构建异构计算SoC架构。最后，通过仿真建模平台进行实验验证，SoC系统的性能功耗比大于1.5 TOPS/W,可通过GoogleNet算法对10路1 080 p 30 fps视频逐帧处理，且每帧端到端的处理时间不超过30 ms。

关键词: 异构计算, 深度学习, 加速部件, 仿真建模

Abstract:

The explosive growth of information volume in the Internet era and the popularization of deep learning have made traditional generalpurpose computing unable to meet largescale, highconcurrency computing requirements. Heterogeneous computing can release greater computing power for deep learning, satisfy higher performance requirements, and be applied to a wider range of computing scenarios. We design and simulate a complete heterogeneous SoC architecture for deep learning. Firstly, we analyze the computational features of commonly used deep learning algorithms such as GoogleNet, VGG and SSD, and summarize them into a limited number of deep learning common operator classes which are displayed in charts and structure diagrams. At the same time, the pseudo instruction stream at the minimum operator level is generated. Then, based on extracted algorithm features, a hardwareaccelerated AI IP core for deep learning is designed, and a heterogeneous computing SoC architecture is constructed. Finally, experimental verification on the simulation modeling platform shows that the performance to power ratio of the SoC system is greater than 1.5 TOPS/W. The 10channel 1080p 30fps video can be processed frame by frame by the GoogleNet algorithm, and the end-to-end processing time of each frame is no more than 30ms.

Key words: heterogeneous computing, deep learning, acceleration unit, simulation modeling

崔浩然，李涵，冯煜晶，吴萌，王超，陶冠良，张志敏. 面向深度学习的SoC架构设计与仿真[J]. 计算机工程与科学.

CUI Haoran，LI Han，FENG Yujing，WU Meng，WANG Chao，TAO Guanliang，ZHANG Zhimin.

Design and simulation of a deep learning SoC architecture

[J]. Computer Engineering & Science.

[1]	吴玉虹, 王建. 基于Patches-CNN的模拟电路故障诊断[J]. 计算机工程与科学, 2025, 47(01): 35-44.
[2]	徐超, 阮荣耀, 陈勇, . 一种基于区块链的医疗数据审计方法[J]. 计算机工程与科学, 2025, 47(01): 95-106.
[3]	陈欣然, 刘宁, 闫中敏, 刘磊, 崔立真. 基于注意力指导的双粒度跨模态医学特征学习框架[J]. 计算机工程与科学, 2025, 47(01): 150-159.
[4]	罗婧, 叶志晟, 杨泽华, 傅天豪, 魏雄, 汪小林, 罗英伟, . 研发类GPU集群任务数据集的构建及分析[J]. 计算机工程与科学, 2024, 46(12): 2128-2137.
[5]	敬超, 闭玉申. 面向深度学习作业的干扰感知在线调度算法研究[J]. 计算机工程与科学, 2024, 46(12): 2138-2148.
[6]	陈磊, 梁正友, 孙宇, 蔡俊民. 多尺度特征融合的移动端单目深度估计研究[J]. 计算机工程与科学, 2024, 46(09): 1616-1524.
[7]	李沛桢, 张洋, 陈文波. 基于DPCT的序列比对软件迁移与性能评估[J]. 计算机工程与科学, 2024, 46(08): 1372-1380.
[8]	刘强, 李沐春, 伍晓洁, 王煜恒. S-JSMA：一种低扰动冗余的快速JSMA对抗样本生成方法[J]. 计算机工程与科学, 2024, 46(08): 1395-1402.
[9]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(07): 1296-1310.
[10]	胡昭华, 王长富, . 改进Faster R-CNN的遥感图像小目标检测算法[J]. 计算机工程与科学, 2024, 46(06): 1063-1071.
[11]	谭郁松, 王伟, 蹇松雷, 易超雄. 基于异常保持的弱监督学习网络入侵检测模型[J]. 计算机工程与科学, 2024, 46(05): 801-809.
[12]	郭宸良, 阎少宏, 宗晨琪. 线云隐私攻击算法的并行加速研究[J]. 计算机工程与科学, 2024, 46(04): 615-625.
[13]	高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.
[14]	罗月童, 李超, 周波, 张延孔. 面向工业缺陷分类的交互式易混淆缺陷分离方法研究[J]. 计算机工程与科学, 2024, 46(03): 463-470.
[15]	吕伏, 韩晓天, 冯永安, 项梁. 基于自适应纹理特征融合的纹理图像分类方法[J]. 计算机工程与科学, 2024, 46(03): 488-498.