• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

资源不均衡Spark环境任务调度优化算法研究

胡亚红1,盛夏2,毛家发1   

  1. (1.浙江工业大学计算机学院,浙江 杭州 310023;2.宁波银行股份有限公司,浙江 宁波 315100)
  • 收稿日期:2019-08-15 修回日期:2019-11-04 出版日期:2020-02-25 发布日期:2020-02-25
  • 基金资助:

    国家重点研发计划(2018YFB0204003)

Task scheduling optimization in Spark
environment with unbalanced resources
 

HU Ya-hong1,SHENG Xia2,Mao Jia-fa1   

  1. (1.College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023;
    2.Bank of Ningbo,Ningbo 315100,China)

     
  • Received:2019-08-15 Revised:2019-11-04 Online:2020-02-25 Published:2020-02-25

摘要:

由于硬件资源的更新换代,集群中各个节点的计算能力会变得不一致。集群异构的出现导致集群计算资源不均衡。目前Spark大数据平台在任务调度时未考虑集群的异构性以及节点资源的利用情况,影响了系统性能的发挥。构建了集群节点的评价指标体系,提出利用节点的优先级来表示其计算能力。提出的节点优先级调整算法能够根据任务执行过程中节点的状态动态调整各个节点的优先级。基于节点优先级的Spark动态自适应调度算法(SDASA)则根据实时的节点优先级值完成任务的分配。实验表明,SDASA能够缩短任务在集群中的执行时间,从而提升集群整体计算性能。
 
 

关键词: 异构集群, 任务调度, 节点优先级, Spark

Abstract:

Due to the updating of hardware resources, the computing capacity of nodes in a cluster becomes inconsistent. The emergence of cluster heterogeneity leads to the imbalance of cluster computing resources. At present, Spark big data platforms does not consider the cluster heterogeneity and the node resource utilization in task scheduling, which affects the system performance. This paper constructs a performance evaluation index system of cluster nodes, and proposes to use the priority of nodes to express their computing capacity. The proposed node priority adjustment algorithm can dynamically adjust the priority of each node according to the status of the node during task execution. The Spark Dynamic Adaptive Scheduling Algorithm (SDASA) based on node priority completes task assignment according to the real-time node priority. Experiments show that SDASA can shorten the execution time of tasks in the cluster and improve the overall computing performance of the cluster.
 

Key words: heterogeneous cluster, task scheduling, node priority, Spark