Computer Engineering & Science >
A Method for Solving the Congestion Issue During the Single Node Recovering Based on the MapReduce Model
Received date: 2009-10-21
Revised date: 2010-01-09
Online published: 2011-03-25
The MapReduce model has provided strong support for the dataintensive supercomputing as a fundamental application flat. It has a singlenode task scheduler, which has a simple architecture and is convenient to control the worker nodes, while there exists the single node error problem. In Hadoop (Open Source MapReduce) released versions, it has three different mechanisms such as synchronization on demand, recovery from history logging and dropping. This paper analyses the data jam, result errors and efficiency decline in the three methods, and then gives a method for delivering the information of task dependencies to solve the problems.
ZHANG Zhaoning,PENG Yuxing . A Method for Solving the Congestion Issue During the Single Node Recovering Based on the MapReduce Model[J]. Computer Engineering & Science, 2011 , 33(3) : 146 -151 . DOI: 10.3969/j.issn.1007130X.2011.
[1]Dean J,Ghemawat S. MapReduce: Simplied Data Processing on Large Clusters[C]∥Proc of OSDI’04,2004:137150.
[2]Ghemawat S, Gobioff H,Leung ST. The Google File System[C]∥Proc of SOSP’03,2003:2943.
[3]Zaharia M,Konwinski A,Joseph A D,et al. Improving MapReduce Performance in Heterogeneous Environments[C]∥Proc of OSDI’08,2008:2942.
[4]Hadoop3245, Provide ability to persist running jobs[EB/OL]. [20090703]. https://issues.apache.org/jira/browse/HADOOP3245.
[5]Hadoop1876, Persisting completed jobs status[EB/OL].[20090703]. https://issues.apache.org/jira/browse/HADOOP1876.
[6]http://www.citrix.com/xenserver.
[7]Running Sort Benchmark[EB/OL].[20090703]. http://wiki.apache.org/hadoop/Sort.
[8]Amazon Elastic Compute Cloud (Amazon EC2) [EB/OL]. [20090703]. http://aws.amazon.com/ec2/.
/
| 〈 |
|
〉 |