• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A loadadaptive feedback scheduling
strategy for heterogeneous Hadoop cluster

PAN Jiayi1,2,3,WANG Fang1,2,3,YANG Jingyi1,2,3,TAN Zhipeng1,2,3   

  1. (1.Wuhan National Lab for Optoelectronics,Huazhong Uniuersity of Science and Technology,Wuhan 430074;
    2.School of Computer Science & Technology,Huazhong Uniuersity of Science and Technology,Wuhan 430074;
    3.Key Laboratoryof Information Storage System,Ministry of Education,
    Huazhong Uniuersity of Science and Technology,Wuhan 430074,China)
  • Received:2016-09-03 Revised:2016-11-01 Online:2017-03-25 Published:2017-03-25

Abstract:

With the development and practice of big data technology, Hadoop YARN (Yet Anouther Resource Negotiator) scheduler is no longer an effective solution in heterogeneous cluster environment. On the one hand, YARN cannot dynamically allocate the resources of nodes, which leads to a waste of better nodes’ resources and poor overall system performance. On the other hand, YARN’s existing static resource allocation policy ignores the difference of the different stages, which causes a large number of resource fragments. Aiming at the above problems, we put forward a loadadaptive feedback scheduling strategy. The system monitors the performance of all nodes and jobs, evaluates the computing power of each node with the realtime monitoring data. Then the scheduler starts the dynamic resource scheduling strategy based on the similarity assessment together with the monitoring information of nodes and jobs’ performance. The optimized system can distinguish the heterogeneity of different nodes, allocate resources for tasks’ realtime needs dynamically, refine YARN’s scheduling semantics and be used as a secondary resource scheduling strategy of the upper scheduler. We implement and test the strategy on Hadoop 2.0, and the experimental results show that this scheduling strategy can significantly improve the utilization rate of resources, improve the cluster’s concurrency by 2 to 3 times, and enhance the performance by nearly 10%.

Key words: heterogeneous cluster, monitor, computing power, dynamic scheduling, loadadaptive