MapReduce：a New Programming Model for Distributed Parallel Computing

LI Chenghua，ZHANG Xinfang，JIN Hai，XIANG Wen

doi:10.3969/j.issn.1007130X.2011.

Computer Engineering & Science >

2011 , Vol. 33 >Issue 3: 129 - 135

DOI: https://doi.org/10.3969/j.issn.1007130X.2011.

论文

MapReduce：a New Programming Model for Distributed Parallel Computing

Expand

(School of Computer Science and Technology,
Huazhong University of Science and Technology,Wuhan 430074,China)

Received date: 2009-12-29

Revised date: 2010-05-04

Online published: 2011-03-25

Fold

Abstract

MapReduce is a programming model introduced by Google for writing applications that rapidly process vast amounts of data in parallel on large clusters of computing nodes. The model is inspired by map and reduce functions commonly used in functional programming. A Map/Reduce job usually splits the input dataset into independent chunks which are processed by the map tasks in a completely parallel manner. The reduce tasks merge all intermediate values generated by the map tasks. Users only devote themselves to how to specify the map functions and reduce functions. The details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required intermachine communication are taken care of by the runtime system of MapReduce. MapReduce will be widely adopted on the cloud computing platform. Several aspects of the Hadoop MapReduce contributed by Apache remain to be perfected.

Key words： MapReduce;distributed parallel computing;cloud computing

Cite this article

LI Chenghua，ZHANG Xinfang，JIN Hai，XIANG Wen . MapReduce：a New Programming Model for Distributed Parallel Computing[J]. Computer Engineering & Science, 2011 , 33(3) : 129 -135 . DOI: 10.3969/j.issn.1007130X.2011.

References

［1］Dean J, Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters［J］. Communications of the ACM, 2005,51(1):107113.
［2］Michael I, Mihai B, Yuan Y, et al.Dryad: Distributed Dataparallel Programs from Sequential Building Blocks［J］.SIGOPS Oper Syst Rev, 2007,41(3):5972.
［3］Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte［EB/OL］.［20090511］.http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html.
［4］郑启龙, 王昊,吴晓伟，等. HPMR : 多核集群上的高性能计算支撑平台［J］. 微电子学与计算机，2008(08):2123.
［5］周锋, 李旭伟. 一种改进的MapReduce并行编程模型［J］. 科协论坛, 2009(2):6566.
［6］邓倩妮, 陈全. 云计算及其关键技术［J］.高性能计算发展与应用, 2009(1):26.
［7］孙广中, 肖锋, 熊曦. MapReduce 模型的调度及容错机制研究［J］. 微电子学与计算机,2007, 24(9):178180.
［8］吴宝贵,丁振国. 基于Map /Reduce的分布式搜索引擎研究［J］. 现代图书情报技术,2007(8):5255.
［9］郑启龙, 房明, 汪胜,等. 基于MapReduce 模型的并行科学计算［J］. 微电子学与计算机, 2009,26(8):1317.
［10］杨代庆, 张智雄. 基于Hadoop的海量共现矩阵生成方法［J］.现代图书情报技术，2009(4):2326.
［11］陈康, 郑纬民. 云计算:系统实例与研究现状［J］. 软件学报, 2009, 20(5):13371348.
［12］Yang H C, Dasdan A, Hsiao R L, et al. MapReduceMerge: Simplified Relational Data Processing on Large Clusters［C］∥Proc of the 2007 ACM SIGMOD Int’l Conf on Management of Data, 2007:10291040.
［13］Ranger C, Raghuraman R, Penmetsa A, et al. Evaluating MapReduce for MultiCore and Multiprocessor Systems［C］∥Proc of the 13th Int’l Symp on HighPerformance Computer Architecture,2007:1324.
［14］de Kruijf M, Sankaralingam K. MapReduce for the Cell B.E. Architecture［R］. Technical Report CSTR20071625, University of Wisconsin Computer Sciences, 2007.
［15］Aguilera M K, Merchant A, Shah M, et al. Sinfonia: A New Paradigm for Building Scalable Distributed Systems［C］∥Proc of the 21st ACM Symp on Operating Systems Principles, 2007:159174.
［16］DeWitt D. MapReduce: A Major Step Backwards［EB/OL］.［20080117］. http://www.databasecolumn.com/2008/01/mapreduceamajorstepback.html.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References