• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Task scheduling optimization in Spark
environment with unbalanced resources
 

HU Ya-hong1,SHENG Xia2,Mao Jia-fa1   

  1. (1.College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023;
    2.Bank of Ningbo,Ningbo 315100,China)

     
  • Received:2019-08-15 Revised:2019-11-04 Online:2020-02-25 Published:2020-02-25

Abstract:

Due to the updating of hardware resources, the computing capacity of nodes in a cluster becomes inconsistent. The emergence of cluster heterogeneity leads to the imbalance of cluster computing resources. At present, Spark big data platforms does not consider the cluster heterogeneity and the node resource utilization in task scheduling, which affects the system performance. This paper constructs a performance evaluation index system of cluster nodes, and proposes to use the priority of nodes to express their computing capacity. The proposed node priority adjustment algorithm can dynamically adjust the priority of each node according to the status of the node during task execution. The Spark Dynamic Adaptive Scheduling Algorithm (SDASA) based on node priority completes task assignment according to the real-time node priority. Experiments show that SDASA can shorten the execution time of tasks in the cluster and improve the overall computing performance of the cluster.
 

Key words: heterogeneous cluster, task scheduling, node priority, Spark