• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (08): 1331-1341.

• 高性能计算 • 上一篇    下一篇

基于LSTM的集群用户作业执行时间预测模型

朱正东1,吴寅超2,胡亚红2,蒋家强1   

  1. (1.西安交通大学计算机学院,陕西 西安 710049;
    2.浙江工业大学计算机科学与技术学院,浙江 杭州 310023)

  • 收稿日期:2021-07-16 修回日期:2021-11-11 接受日期:2022-08-25 出版日期:2022-08-25 发布日期:2022-08-25
  • 基金资助:
    国家重点研发计划(2018YFB0204004,2018YFB0204003)

A cluster job execution time prediction model based on LSTM

ZHU Zheng-dong1,WU Yin-chao2,HU Ya-hong2,JIANG Jia-qiang1   

  1. (1.School of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710049;
    2.College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China)
  • Received:2021-07-16 Revised:2021-11-11 Accepted:2022-08-25 Online:2022-08-25 Published:2022-08-25

摘要: 为提升服务质量,数据中心需要确保在规定的截止时间前完成用户作业,因此必须根据实时的系统资源对作业进行有效的调度。提出了一种作业调度算法,根据预测的作业执行时间进行批作业调度,以最小化批作业的完成时间。作业执行时间预测模型基于长短期记忆LSTM网络,根据用户作业类型、作业量、作业需要的CPU核数和内存数量,以及作业需要的资源在系统总资源中的占比,对用户作业的执行时间进行预测。预测结果用于判断集群是否有能力按时完成用户作业,同时为合理安排各作业的执行顺序提供依据。通过实验确定了影响LSTM时间预测模型性能的各超参数取值,如迭代次数、学习率和网络层数等。实验表明,与SVR模型、ARIMA模型和BP模型相比,基于LSTM的作业执行时间预测模型的决定系数R2分别有2.97%,2.34%和5.66%的提升效果,且预测的平均误差仅为0.78%。

关键词: LSTM, 时间预测, 作业调度, 服务质量

Abstract: To improve the quality of service (QoS), data centers need to ensure that user jobs can be completed within a specified deadline, so jobs must be efficiently scheduled based on real-time system resources. A job scheduling algorithm based on a LSTM (Long Short-Term Memory)-based job execution time prediction model is proposed to minimize the job completion time. The LSTM-based time prediction model predicts the execution time of user jobs according to the type of user jobs, the amount of jobs, the number of CPU cores and memory required by the jobs, and the ratio of the resources required by the jobs to the total system resources. The prediction results are used to judge whether the cluster is capable of completing user jobs on time, and provide a basis for rationally arranging the execution order of the jobs. The hyperparameters that affect the performance of the LSTM time prediction model, such as the number of iterations, the learning rate and the number of network layers, are determined through experiments. Experiments show that compared with the SVR model, ARIMA model and BP model, the job execution time prediction model based on LSTM improves the determination coefficient R2 by 297%, 2.34% and 5.66% respectively, and the average error of its prediction is only 0.78%. 

Key words: long short-term memory(LSTM), time prediction, job scheduling, quality of service(QoS)