• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (07): 1181-1190.

• High Performance Computing • Previous Articles     Next Articles

Cluster job runtime prediction based on NR-Transformer

CHEN Feng-xian   

  1. (Office of Network Security and Information,Lanzhou University,Lanzhou 730000,China)
  • Received:2021-04-02 Revised:2021-09-14 Accepted:2022-07-25 Online:2022-07-25 Published:2022-07-25

Abstract: Job scheduling of high-performance clusters is usually implemented by the job scheduling system. Filling in the job running time accurately can greatly improve the efficiency of job scheduling. Existing research usually uses machine learning for prediction, and the prediction accuracy and practicality can be further improved. In order to further improve the accuracy of cluster job running time prediction, cluster job logs are firstly clustered, and job category information is added to job features. Secondly, the job log data is modeled and predicted using the attention-based NR-Transformer network. In data processing, according to the correlation with the prediction target, the integrity of the feature and the validity of the data, 7-dimensional features are selected from the historical log dataset, the dataset is divided into multiple job sets according to the length of the job running time, and then each job set is trained and predicted separately. The experimental results show that, compared with traditional machine learning and BP neural network, its timing neural network structure has better prediction performance, and NR-Transformer has better performance on each job set.

Key words: high performance computing, parallel job scheduling, user clustering, timing neural network, attention mechanism