• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

一种基于两级DAG模型的MapReduce工作流异构调度算法

王宇新,王飞,王冠,郭禾   

  1. (大连理工大学计算机科学与技术学院,辽宁 大连 116023)
  • 收稿日期:2018-12-01 修回日期:2019-02-11 出版日期:2019-08-25 发布日期:2019-08-25
  • 基金资助:

    国家自然科学基金(11372067,61772112)

A MapReduce workflow heterogeneous scheduling
algorithm based on two-level DAG model
 

WANG Yu-xin,WANG Fei,WANG Guan,GUO He   

  1. (School of Computer Science and Technology,Dalian University of Technology,Dalian 116023,China)
     
  • Received:2018-12-01 Revised:2019-02-11 Online:2019-08-25 Published:2019-08-25

摘要:

MapReduce编程模型被广泛应用于大数据处理平台,而一个有效的任务调度算法对模型的运行效率至关重要。将MapReduce工作流的Map和Reduce阶段分别拆解为若干个有先后序限定关系的作业,每个作业再拆解为多个任务。之后基于计算集群的可用资源和任务异构性,构建面向作业和任务的2级有向无环图(DAG)模型,同时提出基于2级优先级排序的异构调度算法2-MRHS。算法的第1阶段进行优先级排序,即对作业和任务分别进行优先权值计算,再汇总得到任务的调度队列;第2阶段进行任务分配,即基于最快完成时间将每个任务所包含的数据块子任务分配给最适合的计算结点。采用大批量随机生成的DAG模型进行实验,结果表明与其他相关算法相比,本文算法有更短的调度长度(makespan)且更加稳定。
 

关键词: MapReduce, 工作流, 异构计算, 任务调度

Abstract:

The MapReduce programming model is widely applied in big data processing platforms, and an effective task scheduling algorithm is critical to the efficiency of the model. In our approach, a MapReduce workflow is decomposed as a number of jobs with successive qualifying relationships and each job has a Map phase and a Reduce phase that both contain multiple tasks. Based on the available resources and task heterogeneity of computing cluster, we construct a  two-level directed acyclic graph (DAG) model for job and tasks, and propose a MapReduce workflow heterogeneous scheduling algorithm based on two level priority ordering (2-MRHS). In the first stage of the algorithm, the priority ordering is performed: the priority weights of the job level and task level are calculated respectively to form the scheduling queue of tasks. In task assignment stage, the data block subtasks of each task are assigned to the appropriate computing node according to the tasks' earliest finish time (EFT). A large number of randomly generated DAG models are used to conduct experiments and the results show that our algorithm has shorter scheduling length (makespan) and better stability than those of others.
 

Key words: MapReduce, workflow, heterogeneous computing, task scheduling