• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (3): 146-151.doi: 10.3969/j.issn.1007130X.2011.

• 论文 • 上一篇    下一篇

基于MapReduce模型单点恢复时阻塞问题的解决方法研究

张钊宁,彭宇行   

  1. (并行与分布处理国防科技重点实验室,湖南 长沙 410073)
  • 收稿日期:2009-10-21 修回日期:2010-01-09 出版日期:2011-03-25 发布日期:2011-03-25
  • 作者简介:张钊宁(1984),男,河北保定人,硕士,研究方向为分布式计算。彭宇行(1963),男,湖南茶陵人,博士,研究员,研究方向为分布式计算.

A Method for Solving the Congestion Issue During the Single Node Recovering Based on the MapReduce Model

ZHANG Zhaoning,PENG Yuxing   

  1. (National Laboratory for Parallel and Distributed Processing,Changsha 710073,China)
  • Received:2009-10-21 Revised:2010-01-09 Online:2011-03-25 Published:2011-03-25

摘要:

MapReduce分布式编程模型为大规模数据密集型计算提供了重要的应用基础平台。其任务调度模型为单点控制模型,这种模型使得体系结构简单,任务调度易于控制,但同时也存在中心节点失效的问题。在Hadoop系统中,当中心节点失效后,为了使得整个工作集群中的作业不中断,在不同版本的Hadoop中采取了按需同步、恢复历史记录和抛弃三种恢复机制。本文详细分析了这三种恢复机制中出现的数据阻塞、结果一致性和效率下降等问题,并针对MapReduce模型中两种基本任务依赖关系的特点,提出了传递依赖关系信息的同步机制,通过在同步过程中传递任务间已有的依赖关系,有效地解决已有机制中存在的问题。

关键词: MapReduce, Hadoop, 任务调度, 单点失效恢复, 任务依赖关系

Abstract:

The MapReduce model has provided strong support for the dataintensive supercomputing as a fundamental application flat. It has a singlenode task scheduler, which has a simple architecture and is convenient to control the worker nodes, while there exists the single node error problem. In Hadoop (Open Source MapReduce) released versions, it has three different mechanisms such as synchronization on demand, recovery from history logging and dropping. This paper analyses the data jam, result errors and efficiency decline in the three methods, and then gives a method for delivering the information of task dependencies to solve the problems.

Key words: MapReduce;Hadoop;task scheduling;single node error recovery;task dependency