• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (07): 1173-1184.

• 高性能计算 • 上一篇    下一篇

面向容器环境的Flink的任务调度优化研究

黄山1,2,3,房六一1,2,3,徐浩桐1,2,3,段晓东1,2,3   

  1. (1.大连民族大学计算机科学与工程学院,辽宁 大连 116600;2.大数据应用技术国家民委重点实验室,辽宁 大连 116600;

    3.大连市民族文化数字技术重点实验室,辽宁 大连 116600)
  • 收稿日期:2021-02-04 修回日期:2021-04-12 接受日期:2021-07-25 出版日期:2021-07-25 发布日期:2021-08-16
  • 基金资助:
    国家重点研发计划(2018YFB1004402)

Task scheduling optimization of Flink in container environment

HUANG Shan1,2,3 ,FANG Liu-yi1,2,3 ,XU Hao-tong1,2,3,DUAN Xiao-dong1,2,3#br#

#br#
  

  1. (1.College of Computer Science and Technology,Dalian Minzu University,Dalian 116600;

    2.State Ethnic Affairs Commission Key Laboratory of Big Data Applied Technology,Dalian 116600;

    3.Dalian Key Laboratory of Digital Technology for National Culture,Dalian 116600,China)

  • Received:2021-02-04 Revised:2021-04-12 Accepted:2021-07-25 Online:2021-07-25 Published:2021-08-16

摘要: 随着互联网技术的飞速发展,人类正在走向大数据时代与云计算时代。Flink作为最新一代的大数据计算引擎,具有低延迟、高吞吐等优势,受到学术界与工业界的青睐。Flink在云环境下部署时,其默认任务调度由于无法获取容器部署分布信息,会导致负载分配不均衡。针对这一问题,提出一种面向容器环境的Flink任务调度算法FSACE,获取每个结点性能信息与容器在结点上的分布信息,优先选择
空闲资源较多的结点的容器,同时可以避免容器被频繁选中造成负载不均。使用云主机与合成数据集对算法进行评测,评测结果表明,在容器环境下部署时,所提出的算法能更均衡地分配任务,可以提高资源使用率和计算速度。


关键词: 容器, 大数据, Flink, 任务调度, 负载均衡, 容器环境

Abstract: With the rapid development of Internet technology, human beings are moving towards the era of big data and cloud computing. As the latest generation of big data computing engine, Flink is favored by academia and industry for its advantages such as low latency and high throughput. When Flink is deployed in the cloud environment, its default task scheduling will lead to uneven load distribution due to the inability to obtain container deployment distribution information. To solve this problem, this paper proposes a Flink task scheduling load balancing algorithm in container environment to obtain the performance information of each node and the distribution information of the container on the node, give priority to the container of nodes with more free resources, and avoid the uneven load caused by the frequent selection of containers. The evaluation results show that the proposed algorithm can more evenly allocate tasks and improve resource utilization and computing speed when deployed in container environment


Key words: container, big data, Flink, task scheduling, load balancing, container environment