面向迈创+MatrixZone异构系统的深度学习编程框架

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (07): 1149-1158.

面向迈创+MatrixZone异构系统的深度学习编程框架

康宇晗1，时洋2，陈照云2，文梅2

(1.湖南师范大学信息科学与工程学院，湖南长沙 410081;2国防科技大学计算机学院，湖南长沙 410073)

收稿日期:2023-01-01 修回日期:2023-03-27 接受日期:2023-07-25 出版日期:2023-07-25 发布日期:2023-07-11
基金资助:
国家自然科学基金（62002366）

A deep learning programming framework for FT-Matrix DSP+MatrixZone heterogeneous systems

KANG Yu-han1，SHI Yang2，CHEN Zhao-yun2，WEN Mei2

(1.School of Information Science and Engineering,Hunan Normal University，Changsha 410081;
2.College of Computer Science and Technology,National University of Defense Technology，Changsha 410073，China)

Received:2023-01-01 Revised:2023-03-27 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

摘要/Abstract

摘要： 为了满足深度学习模型迭代速度快、算力要求高的需求，主流硬件厂商愈发倾向于采用通用处理器+AI专用加速核的异构系统。但是，由于AI专用加速核仅支持部分核心算子，不具备通用编程能力，如何在这样的异构架构上完成深度学习任务的高效部署值得被深入研究。基于国产自研迈创+MatrixZone异构系统平台，设计并实现了深度学习编程框架KaiSa。KaiSa通过分析深度学习模型输入参数，识别算子类型并划分至对应计算核；对于复杂算子，KaiSa基于性能模型自动完成最优分块大小的搜索，提升双核并行计算的性能。同时，为了实现程序的高效率开发，KaiSa屏蔽了所有的底层硬件细节，给用户提供了一个友好的编程环境。实验结果表明，KaiSa可以获得高达39.0%的性能提升。

关键词: 深度学习, 飞腾迈创, 脉动加速器, 异构系统, 性能优化

Abstract: To meet the fast iteration speed and high computing power requirements of deep learning models, mainstream hardware vendors are increasingly inclined towards heterogeneous systems consisting of general-purpose processors and AI-specific accelerator cores. However, AI-specific accelerator cores only support certain core operators and do not have general programming capabilities. Therefore, how to efficiently deploy deep learning tasks on such heterogeneous architectures is worth further research. Based on the domestically developed FT-Matrix DSP+MatrixZone heterogeneous system platform, this paper designs and implements a deep learning programming framework, called KaiSa. KaiSa analyzes the input parameters of the deep learning model, identifies the operator type, and assigns it to the corresponding computing core. For complex operators, KaiSa automatically completes the optimal search for the block size based on a performance model, improving the performance of dual-core parallel computing. At the same time, KaiSa shields all low-level hardware details to provide users with a friendly programming environment for efficient program development. Experimental results show that KaiSa can achieve performance improvements of up to 39.0%.

Key words: deep learning;FT-Matrix, MatrixZone;heterogeneous system;performance optimization

康宇晗, 时洋, 陈照云, 文梅. 面向迈创+MatrixZone异构系统的深度学习编程框架[J]. 计算机工程与科学, 2023, 45(07): 1149-1158.

KANG Yu-han, SHI Yang, CHEN Zhao-yun, WEN Mei. A deep learning programming framework for FT-Matrix DSP+MatrixZone heterogeneous systems[J]. Computer Engineering & Science, 2023, 45(07): 1149-1158.

编辑推荐

Metrics

阅读次数

全文

367

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	367

来源	本网站	其他网站

次数	285	82
比例	78%	22%

摘要

214

最新录用	在线预览	正式出版

0	0	214

	来源	本网站

	次数	214
	比例	100%

[1]	陈欣然, 刘宁, 闫中敏, 刘磊, 崔立真. 基于注意力指导的双粒度跨模态医学特征学习框架[J]. 计算机工程与科学, 2025, 47(01): 150-159.
[2]	敬超, 闭玉申. 面向深度学习作业的干扰感知在线调度算法研究[J]. 计算机工程与科学, 2024, 46(12): 2138-2148.
[3]	陈磊, 梁正友, 孙宇, 蔡俊民. 多尺度特征融合的移动端单目深度估计研究[J]. 计算机工程与科学, 2024, 46(09): 1616-1524.
[4]	刘强, 李沐春, 伍晓洁, 王煜恒. S-JSMA：一种低扰动冗余的快速JSMA对抗样本生成方法[J]. 计算机工程与科学, 2024, 46(08): 1395-1402.
[5]	施禹, 董攀, 张利军. 一种不规则稀疏矩阵的SpMV方法[J]. 计算机工程与科学, 2024, 46(07): 1175-1184.
[6]	胡昭华, 王长富, . 改进Faster R-CNN的遥感图像小目标检测算法[J]. 计算机工程与科学, 2024, 46(06): 1063-1071.
[7]	李飞, 郭绍忠, 周蓓, 宋广辉, 郝江伟, 许瑾晨. RISC-V基础数学库性能优化[J]. 计算机工程与科学, 2023, 45(09): 1532-1543.
[8]	易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148.
[9]	刘浩翰, 孙铖, 贺怀清, 惠康华. 基于改进YOLOv3的金属表面缺陷检测[J]. 计算机工程与科学, 2023, 45(07): 1226-1235.
[10]	莫舒恒, 卢圣有, 黄聃, 卢宇彤. 基于即时编译的GNU Octave性能优化[J]. 计算机工程与科学, 2022, 44(12): 2091-2101.
[11]	车生兵, 张光琳. 基于深度学习的Webshell检测[J]. 计算机工程与科学, 2022, 44(06): 994-1002.
[12]	王霞, 徐慧英, 朱信忠. 一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型[J]. 计算机工程与科学, 2022, 44(06): 1083-1089.
[13]	王栋, 杨珂, 玄佳兴, 韩雨桐, 赵丽花, 王旭仁. 基于半监督生成对抗网络的恶意代码家族分类实现[J]. 计算机工程与科学, 2022, 44(05): 826-833.
[14]	沈佳杰, 卢修文, 向望, 赵泽宇, 王新, . 分布式存储系统读写一致性算法性能优化研究综述[J]. 计算机工程与科学, 2022, 44(04): 571-583.
[15]	李利荣, 王子炎, 张开, 杨荻椿, 熊炜, 巩朋成, . 基于OSE-dResnet网络的列车底部零件检测算法[J]. 计算机工程与科学, 2022, 44(04): 692-698.