• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (06): 951-961.

Previous Articles     Next Articles

Scheduling and optimization of multi-job execution in distributed environment

JI Hang-xu1,JIANG Su1,ZHAO Yu-hai1,WU Gang1,WANG Guo-ren2   

  1. (1.School of Computer Science and Engineering,Northeastern University,Shenyang 110819;

    2.School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)

  • Received:2020-10-03 Revised:2020-12-30 Accepted:2021-06-25 Online:2021-06-25 Published:2021-06-22

Abstract: Distributed big data computing engines are indispensable tools for scientific research institutions, Internet companies, and government departments to process large-scale data. Their use and promotion have promoted the rapid development of various fields and made great contributions to social progress. However, in the case of multi-job processing, the current mainstream big data computing engines still have many shortcomings in resource allocation and job scheduling. They usually divide multi-jobs into memory resources equally and use first-input-first-output (FIFO) method for scheduling jobs, such a simple resource partitioning method and job scheduling mechanism cannot give full play to system performance. In response to this problem, improvements have been made from the job level of the computing engine: (1) in terms of resource division, the task amount of the job is estimated to judge the difference between the task amount and the pre-allocated resources of job, and the jobs with high waste of cluster resources are merged to fully utilize the computing resources by the extraction of job features; (2) in terms of job scheduling, the features of the jobs in the job pool are extracted so that cluster analysis is conducted for the jobs by multipath K-means algorithm, and then self-balancing polling scheduling algorithm is used to schedule the jobs based on the analyzed results to achieve the load balance. In order to verify the effectiveness of the proposed algorithm, comparative experiments were conducted in a distributed cluster environment using large-scale text data sets. The experimental results show that the proposed job merging algorithm and multi-job scheduling algorithm can reduce the job running time by 5% to 23%, improves the system throughput by 7.5%~29%, and reduce the number of threads started by 40% in the best case.


Key words: distributed, job merging, cluster, polling scheduling, Flink