• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (1): 28-39.

• High Performance Computing • Previous Articles     Next Articles

Characteristics analysis and runtime prediction of jobs in supercomputer

YANG Hongzhen,CHENG Wei,DU Liang,HUANG Dan,ZENG Chuxuan,XIAO Nong   

  1. (1.School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006;
    2.Computing Network Research and Operation Department,Guangdong Unicom,Guangzhou 510630,China) 
  • Received:2024-11-26 Revised:2024-12-28 Online:2026-01-25 Published:2026-01-25

Abstract: Job logs of high-performance computing (HPC) clusters can be utilized to analyze system workloads, identify periodic patterns in system usage, correlations among job characteristics, and user behavior patterns. This analysis further facilitates the development of a runtime prediction model, reducing the error in estimated job runtimes  and enhancing the performance of job backfilling scheduling. Existing prediction algorithms primarily focus on improving the average prediction accuracy of job runtimes but overlooking scenarios where predicted values fall below actual runtimes (underprediction), which may cause the scheduler to prematurely terminate running jobs, thereby reducing the effective  utilization of system resources. To address the aforementioned issue, based on an analysis of the long-term trends and correlations of HPC job characteristics, this paper proposes an ensemble learning model to predict job runtimes and introduces an ordered extended maximum strategy to adjust the prediction results of the ensemble model. Experimental results  demonstrate that the job runtime prediction model significantly reduces the underprediction rate while maintaining high prediction accuracy, and it exhibits good stability and generalization capabilities.

Key words: high-performance computing, large-scale system, characteristics analysis, runtime prediction, ensemble learning