• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (08): 1331-1340.

Previous Articles     Next Articles

User QoS-aware deep learning task dynamic scheduling on GPU clusters

LUO Lei,CHEN Zhao-yun,WANG Li-xuan   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

  • Received:2020-06-09 Revised:2020-09-23 Accepted:2021-08-25 Online:2021-08-25 Published:2021-08-24

Abstract:

A QoS (Quality of Service)-aware deep learning task dynamic scheduling method on GPU clusters is proposed. The offline evaluation module is used to perform offline evaluation of deep learning tasks and build a computational performance prediction model. Based on the performance prediction model, combined with the expected QoS of the task, the online scheduling module carries out the scheduling of task placement and task execution sequence. Experiments on a distributed GPU cluster demonstrate that the proposed method can achieve higher QoS-guarantee percentage and cluster resource utilization than other baseline schedulers.

Key words: deep learning, GPU cluster, task scheduling;QoS