面向深度学习作业的干扰感知在线调度算法研究

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (12): 2138-2148.

面向深度学习作业的干扰感知在线调度算法研究

敬超1,2,闭玉申1

(1.桂林理工大学计算机科学与工程学院，广西桂林 541006；
2.桂林理工大学广西嵌入式技术与智能系统重点实验室，广西桂林 541006)

收稿日期:2023-06-06 修回日期:2023-10-06 接受日期:2024-12-25 出版日期:2024-12-25 发布日期:2024-12-23
基金资助:
国家自然科学基金(62362018)

OASIS: An interference-aware online scheduling algorithm for deep learning jobs

JING Chao1,2,BI Yu-shen1

(1.College of Computer Science and Engineering,Guilin University of Technology,Guilin 541006；
2.Guangxi Key Laboratory of Embedded Technology and Intelligent System,
Guilin University of Technology,Guilin 541006,China)

Received:2023-06-06 Revised:2023-10-06 Accepted:2024-12-25 Online:2024-12-25 Published:2024-12-23

摘要/Abstract

摘要： 由于GPU可以加速深度学习作业的处理，许多研究人员通过提高GPU利用率来达到减少作业完成时间的目的。与传统的作业独占GPU资源来减少作业完成时间不同，考虑了多个作业共置的问题(即同一个GPU中同时执行多个作业能有效提高GPU利用率并减少作业完成时间)，提出了一种面向深度学习作业的干扰感知在线调度算法(OASIS)。该算法首先在作业共置的情况下，使用改进的机器学习方法构建了作业所需资源的预测模型。其次，为了计算作业间干扰值，设计了一种作业组合模型，通过模型计算的干扰值来主动修改作业调度策略以避免无效调度，达到减少作业完成时间的目的。最后，在真实环境中部署了实验，实验结果表明：提出的OASIS算法与经典的FCFS算法、MBP算法和SJF算法相比，不仅平均作业总体完成时间缩短了5.7%，而且平均能耗降低了4.0%，验证结果充分说明了该算法的有效性和优越性。

关键词: 深度学习, 干扰感知, 资源预测模型, 在线调度

Abstract: Since GPU can accelerate the processing of deep learning jobs, many researchers aim to reduce job completion time by improving GPU utilization. Different from the traditional approach of dedicating GPU resources to a single job to reduce completion time, this paper considers the issue of job colocation (i.e., executing multiple jobs simultaneously on the same GPU to effectively improve GPU utilization and reduce job completion time) and proposes an interference-aware online scheduling algorithm for deep learning jobs (OASIS). This algorithm first uses an improved machine learning approach to construct a prediction model for the resources required by jobs in the context of job colocation. Then, to calculate the interference values between jobs, a job combination model is designed. The interference values calculated by this model are used to proactively adjust the job scheduling strategy to avoid ineffective scheduling, thereby reducing job completion time. Finally, experiments are deployed in a real-world environment, and the results show that compared to the classical FCFS, MBP, and SJF algorithms, the proposed OASIS algorithm not only reduces the average total job completion time by 5.7%, but also decreases the average energy consumption by 4.0%. These results fully demonstrate the effectiveness and superiority of the proposed algorithm.

Key words: deep learning, interference-aware, resource prediction model, online scheduling

敬超, 闭玉申. 面向深度学习作业的干扰感知在线调度算法研究[J]. 计算机工程与科学, 2024, 46(12): 2138-2148.

JING Chao, BI Yu-shen. OASIS: An interference-aware online scheduling algorithm for deep learning jobs[J]. Computer Engineering & Science, 2024, 46(12): 2138-2148.

编辑推荐

Metrics

阅读次数

全文

187

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	187

来源	本网站	其他网站

次数	141	46
比例	75%	25%

摘要

最新录用	在线预览	正式出版

0	0	94

	来源	本网站

	次数	94
	比例	100%

[1]	陈欣然, 刘宁, 闫中敏, 刘磊, 崔立真. 基于注意力指导的双粒度跨模态医学特征学习框架[J]. 计算机工程与科学, 2025, 47(01): 150-159.
[2]	陈磊, 梁正友, 孙宇, 蔡俊民. 多尺度特征融合的移动端单目深度估计研究[J]. 计算机工程与科学, 2024, 46(09): 1616-1524.
[3]	刘强, 李沐春, 伍晓洁, 王煜恒. S-JSMA：一种低扰动冗余的快速JSMA对抗样本生成方法[J]. 计算机工程与科学, 2024, 46(08): 1395-1402.
[4]	胡昭华, 王长富, . 改进Faster R-CNN的遥感图像小目标检测算法[J]. 计算机工程与科学, 2024, 46(06): 1063-1071.
[5]	易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148.
[6]	康宇晗, 时洋, 陈照云, 文梅. 面向迈创+MatrixZone异构系统的深度学习编程框架[J]. 计算机工程与科学, 2023, 45(07): 1149-1158.
[7]	刘浩翰, 孙铖, 贺怀清, 惠康华. 基于改进YOLOv3的金属表面缺陷检测[J]. 计算机工程与科学, 2023, 45(07): 1226-1235.
[8]	车生兵, 张光琳. 基于深度学习的Webshell检测[J]. 计算机工程与科学, 2022, 44(06): 994-1002.
[9]	王霞, 徐慧英, 朱信忠. 一种基于谱归一化的两阶段堆叠结构生成对抗网络的文本生成图像模型[J]. 计算机工程与科学, 2022, 44(06): 1083-1089.
[10]	王栋, 杨珂, 玄佳兴, 韩雨桐, 赵丽花, 王旭仁. 基于半监督生成对抗网络的恶意代码家族分类实现[J]. 计算机工程与科学, 2022, 44(05): 826-833.
[11]	李利荣, 王子炎, 张开, 杨荻椿, 熊炜, 巩朋成, . 基于OSE-dResnet网络的列车底部零件检测算法[J]. 计算机工程与科学, 2022, 44(04): 692-698.
[12]	卜冠华, 周礼亮, 李昊, 张敏 . 基于深度学习的GPS轨迹去匿名研究[J]. 计算机工程与科学, 2022, 44(02): 244-250.
[13]	罗磊, 陈照云, 王俪璇. 用户QoS感知的GPU集群深度学习任务动态调度[J]. 计算机工程与科学, 2021, 43(08): 1331-1340.