• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊
论文

基于MapReduce模型单点恢复时阻塞问题的解决方法研究

展开
  • (并行与分布处理国防科技重点实验室,湖南 长沙 410073)
张钊宁(1984),男,河北保定人,硕士,研究方向为分布式计算。彭宇行(1963),男,湖南茶陵人,博士,研究员,研究方向为分布式计算.

收稿日期: 2009-10-21

  修回日期: 2010-01-09

  网络出版日期: 2011-03-25

A Method for Solving the Congestion Issue During the Single Node Recovering Based on the MapReduce Model

Expand
  • (National Laboratory for Parallel and Distributed Processing,Changsha 710073,China)

Received date: 2009-10-21

  Revised date: 2010-01-09

  Online published: 2011-03-25

摘要

MapReduce分布式编程模型为大规模数据密集型计算提供了重要的应用基础平台。其任务调度模型为单点控制模型,这种模型使得体系结构简单,任务调度易于控制,但同时也存在中心节点失效的问题。在Hadoop系统中,当中心节点失效后,为了使得整个工作集群中的作业不中断,在不同版本的Hadoop中采取了按需同步、恢复历史记录和抛弃三种恢复机制。本文详细分析了这三种恢复机制中出现的数据阻塞、结果一致性和效率下降等问题,并针对MapReduce模型中两种基本任务依赖关系的特点,提出了传递依赖关系信息的同步机制,通过在同步过程中传递任务间已有的依赖关系,有效地解决已有机制中存在的问题。

本文引用格式

张钊宁,彭宇行 . 基于MapReduce模型单点恢复时阻塞问题的解决方法研究[J]. 计算机工程与科学, 2011 , 33(3) : 146 -151 . DOI: 10.3969/j.issn.1007130X.2011.

Abstract

The MapReduce model has provided strong support for the dataintensive supercomputing as a fundamental application flat. It has a singlenode task scheduler, which has a simple architecture and is convenient to control the worker nodes, while there exists the single node error problem. In Hadoop (Open Source MapReduce) released versions, it has three different mechanisms such as synchronization on demand, recovery from history logging and dropping. This paper analyses the data jam, result errors and efficiency decline in the three methods, and then gives a method for delivering the information of task dependencies to solve the problems.

参考文献

[1]Dean J,Ghemawat S. MapReduce: Simplied Data Processing on Large Clusters[C]∥Proc of OSDI’04,2004:137150.
[2]Ghemawat S, Gobioff H,Leung ST. The Google File System[C]∥Proc of SOSP’03,2003:2943.
[3]Zaharia M,Konwinski A,Joseph A D,et al. Improving MapReduce Performance in Heterogeneous Environments[C]∥Proc of OSDI’08,2008:2942.
[4]Hadoop3245, Provide ability to persist running jobs[EB/OL]. [20090703]. https://issues.apache.org/jira/browse/HADOOP3245.
[5]Hadoop1876, Persisting completed jobs status[EB/OL].[20090703]. https://issues.apache.org/jira/browse/HADOOP1876.
[6]http://www.citrix.com/xenserver.
[7]Running Sort Benchmark[EB/OL].[20090703]. http://wiki.apache.org/hadoop/Sort.
[8]Amazon Elastic Compute Cloud (Amazon EC2) [EB/OL]. [20090703]. http://aws.amazon.com/ec2/.

文章导航

/