• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

异构Hadoop集群下的负载自适应反馈调度策略

潘佳艺1,2,3,王芳1,2,3,杨静怡1,2,3,谭支鹏1,2,3   

  1. (1.华中科技大学武汉光电国家实验室,湖北 武汉 430074;2.华中科技大学计算机科学与技术学院,湖北 武汉 430074;
    3.华中科技大学信息存储系统教育部重点实验室,湖北 武汉 430074)
  • 收稿日期:2016-09-03 修回日期:2016-11-01 出版日期:2017-03-25 发布日期:2017-03-25
  • 基金资助:

    国家863计划(2013AA013203)

A loadadaptive feedback scheduling
strategy for heterogeneous Hadoop cluster

PAN Jiayi1,2,3,WANG Fang1,2,3,YANG Jingyi1,2,3,TAN Zhipeng1,2,3   

  1. (1.Wuhan National Lab for Optoelectronics,Huazhong Uniuersity of Science and Technology,Wuhan 430074;
    2.School of Computer Science & Technology,Huazhong Uniuersity of Science and Technology,Wuhan 430074;
    3.Key Laboratoryof Information Storage System,Ministry of Education,
    Huazhong Uniuersity of Science and Technology,Wuhan 430074,China)
  • Received:2016-09-03 Revised:2016-11-01 Online:2017-03-25 Published:2017-03-25

摘要:

随着基于Hadoop平台的大数据技术的不断发展和实践的深入,Hadoop YARN资源调度策略在异构集群中的不适用性越发明显。一方面,节点资源无法动态分配,导致优势节点的计算资源浪费、系统性能没有充分发挥;另一方面,现有的静态资源分配策略未考虑作业在不同执行阶段的差异,易产生大量资源碎片。基于以上问题,提出了一种负载自适应调度策略。监控集群执行节点和提交作业的性能信息,利用实时监控数据建模、量化节点的综合计算能力,结合节点和作业的性能信息在调度器上启动基于相似度评估的动态资源调度方案。优化后的系统能够有效识别集群节点的执行能力差异,并根据作业任务的实时需求进行细粒度的动态资源调度,在完善YARN现有调度语义的同时,可作为子级资源调度方案架构在上层调度器下。在Hadoop 2.0上实现并测试该策略,实验结果表明,作业的自适应资源调度策略显著提高了资源利用率,集群并发度提高了2到3倍,时间性能提升了近10%。
 

关键词: 异构集群, 监控, 计算能力, 动态调度, 负载自适应

Abstract:

With the development and practice of big data technology, Hadoop YARN (Yet Anouther Resource Negotiator) scheduler is no longer an effective solution in heterogeneous cluster environment. On the one hand, YARN cannot dynamically allocate the resources of nodes, which leads to a waste of better nodes’ resources and poor overall system performance. On the other hand, YARN’s existing static resource allocation policy ignores the difference of the different stages, which causes a large number of resource fragments. Aiming at the above problems, we put forward a loadadaptive feedback scheduling strategy. The system monitors the performance of all nodes and jobs, evaluates the computing power of each node with the realtime monitoring data. Then the scheduler starts the dynamic resource scheduling strategy based on the similarity assessment together with the monitoring information of nodes and jobs’ performance. The optimized system can distinguish the heterogeneity of different nodes, allocate resources for tasks’ realtime needs dynamically, refine YARN’s scheduling semantics and be used as a secondary resource scheduling strategy of the upper scheduler. We implement and test the strategy on Hadoop 2.0, and the experimental results show that this scheduling strategy can significantly improve the utilization rate of resources, improve the cluster’s concurrency by 2 to 3 times, and enhance the performance by nearly 10%.

Key words: heterogeneous cluster, monitor, computing power, dynamic scheduling, loadadaptive