• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (12): 2138-2148.

• High Performance Computing • Previous Articles     Next Articles

OASIS: An interference-aware online scheduling algorithm for deep learning jobs

JING Chao1,2,BI Yu-shen1   

  1. (1.College of Computer Science and Engineering,Guilin University of Technology,Guilin 541006;
    2.Guangxi Key Laboratory of Embedded Technology and Intelligent System,
    Guilin University of Technology,Guilin 541006,China)
  • Received:2023-06-06 Revised:2023-10-06 Accepted:2024-12-25 Online:2024-12-25 Published:2024-12-23

Abstract: Since GPU can accelerate the processing of deep learning jobs, many researchers aim to reduce job completion time by improving GPU utilization. Different from the traditional approach of dedicating GPU resources to a single job to reduce completion time, this paper considers the issue of job colocation (i.e., executing multiple jobs simultaneously on the same GPU to effectively improve GPU utilization and reduce job completion time) and proposes an interference-aware online scheduling algorithm for deep learning jobs (OASIS). This algorithm first uses an improved machine learning approach to construct a prediction model for the resources required by jobs in the context of job colocation. Then, to calculate the interference values between jobs, a job combination model is designed. The interference values calculated by this model are used to proactively adjust the job scheduling strategy to avoid ineffective scheduling, thereby reducing job completion time. Finally, experiments are deployed in a real-world environment, and the results show that compared to the classical FCFS, MBP, and SJF algorithms, the proposed OASIS algorithm not only reduces the average total job completion time by 5.7%, but also decreases the average energy consumption by 4.0%. These results fully demonstrate the effectiveness and superiority of the proposed algorithm.

Key words: deep learning, interference-aware, resource prediction model, online scheduling