• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (3): 129-135.doi: 10.3969/j.issn.1007130X.2011.

• 论文 • 上一篇    下一篇

MapReduce:新型的分布式并行计算编程模型

李成华,张新访,金海,向文   

  1. (华中科技大学计算机科学与技术学院,湖北 武汉 430074)
  • 收稿日期:2009-12-29 修回日期:2010-05-04 出版日期:2011-03-25 发布日期:2011-03-25
  • 作者简介:李成华(1972),男,湖北仙桃人,博士后,CCF会员(E200012566M),研究方向为并行计算、数据挖掘。张新访(1965),男,广东五华人,博士,教授,研究方向为信息安全、云计算、嵌入式系统及应用。

MapReduce:a New Programming Model for Distributed Parallel Computing

LI Chenghua,ZHANG Xinfang,JIN Hai,XIANG Wen   

  1. (School of Computer Science and Technology,
    Huazhong University of Science and Technology,Wuhan 430074,China)
  • Received:2009-12-29 Revised:2010-05-04 Online:2011-03-25 Published:2011-03-25

摘要:

MapReduce是Google提出的分布式并行计算编程模型,用于大规模数据的并行处理。MapReduce模型受函数式编程语言的启发,将大规模数据处理作业拆分成若干个可独立运行的Map任务,分配到不同的机器上去执行,生成某种格式的中间文件,再由若干个Reduce任务合并这些中间文件获得最后的输出文件。用户在使用MapReduce模型进行大规模数据处理时,可以将主要精力放在如何编写Map和Reduce函数上,其它并行计算中的复杂问题诸如分布式文件系统、工作调度、容错、机器间通信等都交给MapReduce 系统处理,在很大程度上降低了整个编程难度。MapReduce日益成为云计算平台的主流编程模型。Apache Hadoop项目提供开源的MapReduce系统还有待进一步完善。

关键词: MapReduce, 并行计算编程模型, 云计算

Abstract:

MapReduce is a programming model introduced by Google for writing applications that rapidly process vast amounts of data in parallel on large clusters of computing nodes. The model is inspired by map and reduce functions commonly used in functional programming. A Map/Reduce job usually splits the input dataset into independent chunks which are processed by the map tasks in a completely parallel manner. The reduce tasks merge all intermediate values generated by the map tasks. Users only devote themselves to how to specify the map functions and reduce functions. The details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required intermachine communication are taken care of by the runtime system of MapReduce. MapReduce will be widely adopted on the cloud computing platform. Several aspects of the Hadoop MapReduce contributed by Apache remain to be perfected.

Key words: MapReduce;distributed parallel computing;cloud computing