• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (07): 1141-1151.

• High Performance Computing • Previous Articles     Next Articles

Research and implementation of a Flink-oriented load balancing task scheduling algorithm

LI Wen-jia1,SHI Lan1,JI Hang-xu1,LUO Yi-peng2   

  1. (1.College of Computer Science and Engineering,Northeastern University,Shenyang 110169;
    2.School of Software,Liaoning University of Technology,Jinzhou 121000,China)

  • Received:2021-11-10 Revised:2022-01-17 Accepted:2022-07-25 Online:2022-07-25 Published:2022-07-25

Abstract: Apache Flink is one of the mainstream big data distributed computing engines, and task scheduling is a key issue in distributed computing systems. Due to the heterogeneity of clusters and the different complexity of operators, uneven load will inevitably appear in the big data computing system Flink. To solve this problem, a load balancing task scheduling algorithm based on resource feedback, named RFTS, is proposed. Through the three modules (real-time resource monitoring, area division, and task scheduling algorithm based on glowworm swarm optimization), the tasks in the waiting queue in the over-loaded machine are allocated to the lighter-loaded machines, so as to reduce the load unevenness of the entire cluster and improve the cluster utilization and execution efficiency of the system. Finally, through the experimental verification based on the TPC-C and TPC-H datasets, the results show that the load balancing task scheduling algorithm based on resource feedback (RFTS) can effectively improve the performance of the Apache Flink computing system in terms of execution time and throughput.


Key words: Apache Flink, load balancing task scheduling algorithm based on resource feedback, real-time resource monitoring, area division, glowworm swarm optimization algorithm