• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (07): 1141-1151.

• 高性能计算 • 上一篇    下一篇

面向Flink的负载均衡任务调度算法的研究与实现

李文佳1,史岚1,季航旭1,罗意彭2   

  1. (1.东北大学计算机科学与工程学院,辽宁 沈阳 110169;2.辽宁工业大学软件学院,辽宁 锦州 121000)
  • 收稿日期:2021-11-10 修回日期:2022-01-17 接受日期:2022-07-25 出版日期:2022-07-25 发布日期:2022-07-25
  • 基金资助:
    科技部重点研发项目(2018YFB1004402)

Research and implementation of a Flink-oriented load balancing task scheduling algorithm

LI Wen-jia1,SHI Lan1,JI Hang-xu1,LUO Yi-peng2   

  1. (1.College of Computer Science and Engineering,Northeastern University,Shenyang 110169;
    2.School of Software,Liaoning University of Technology,Jinzhou 121000,China)

  • Received:2021-11-10 Revised:2022-01-17 Accepted:2022-07-25 Online:2022-07-25 Published:2022-07-25

摘要: Apache Flink是现在主流的大数据分布式计算引擎之一,其中任务调度问题是分布式计算系统中的关键问题。由于集群的异构性以及不同算子复杂度不同,大数据计算系统Flink中不可避免地会出现负载不均的情况,针对这种问题,提出了基于资源反馈的负载均衡任务调度算法RFTS。通过实时资源监控、区域划分和基于人工萤火虫优化的任务调度算法3个模块,把负载过重的机器中处于等待状态的任务分配给负载较轻的机器,来实现集群的负载均衡,提高系统集群利用率和执行效率。最后通过基于TPC-C和TPC-H数据集的实验结果表明,RFTS算法从执行时间和吞吐量2个方面有效提升了Apache Flink计算系统的性能。

关键词: Apache Flink;基于资源反馈的负载均衡任务调度算法;实时资源监控;区域划分;人工萤火虫优化算法 ,

Abstract: Apache Flink is one of the mainstream big data distributed computing engines, and task scheduling is a key issue in distributed computing systems. Due to the heterogeneity of clusters and the different complexity of operators, uneven load will inevitably appear in the big data computing system Flink. To solve this problem, a load balancing task scheduling algorithm based on resource feedback, named RFTS, is proposed. Through the three modules (real-time resource monitoring, area division, and task scheduling algorithm based on glowworm swarm optimization), the tasks in the waiting queue in the over-loaded machine are allocated to the lighter-loaded machines, so as to reduce the load unevenness of the entire cluster and improve the cluster utilization and execution efficiency of the system. Finally, through the experimental verification based on the TPC-C and TPC-H datasets, the results show that the load balancing task scheduling algorithm based on resource feedback (RFTS) can effectively improve the performance of the Apache Flink computing system in terms of execution time and throughput.


Key words: Apache Flink, load balancing task scheduling algorithm based on resource feedback, real-time resource monitoring, area division, glowworm swarm optimization algorithm