• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (11): 87-93.

• 论文 • 上一篇    下一篇

一种混合计算环境下的MapReduce并行模型

唐兵1,贺海武2   

  1. (1.湖南科技大学计算机科学与工程学院,湖南 湘潭 411201;2.法国里昂高等师范学院并行计算实验室,法国 里昂 69364)
  • 收稿日期:2013-08-03 修回日期:2013-10-09 出版日期:2013-11-25 发布日期:2013-11-25
  • 基金资助:

    湖南省教育厅资助项目(12C0121);湖南科技大学博士科研启动基金(E51097);法国国家科研署项目MapReduce(ANR-10-SEGI-001-01)

A novel MapReduce parallel model
in hybrid computing environment              

TANG Bing1,HE Hai-wu2   

  1. (1.School of Computer Science and Engineering,Hunan University of Science and Technology, Xiangtan 411201,China;
    2.Laboratoire de l’Informatique du Parallélisme,Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 07,France)
  • Received:2013-08-03 Revised:2013-10-09 Online:2013-11-25 Published:2013-11-25

摘要:

提出了一种混合计算环境下的MapReduce并行计算模型,利用该模型可以将高性能集群节点与Internet或Intranet下异构的桌面PC组成混合计算环境,在该混合环境下运行MapReduce任务进行海量数据分析处理,充分利用了大规模桌面PC的计算与存储能力。与Hadoop类似,该模型分为存储层和任务层两层。对该模型及其核心的HybridDFS分布式文件系统和MapReduce算法进行了简单描述,进而设计并实现了一个原型系统,并对其进行了性能测试。测试结果表明,提出的混合计算模型不仅能够实现可靠的MapReduce计算,而且降低了计算的成本开销,具有非常大的潜力。

关键词: 混合计算环境, MapReduce, 志愿计算, 容错, 分布式文件系统

Abstract:

A novel MapReduce computation model in hybrid computing environment is proposed. Using this model, high performance cluster nodes and heterogeneous desktop PCs in Internet or Intranet can be integrated to form a hybrid computing environment, where MapReduce tasks can be executed to process large-scale datasets. In this way, the computation and storage capability of large-scale desktop PCs are fully utilized. Similar to the design of Hadoop, this model composes of storage layer and task layer. The paper introduces the architecture of the model briefly and describes the core HybridDFS and the MapReduce algorithms. Then, a prototype system is designed and implemented, and performance evaluations are accomplished. Evaluation results show that the proposed hybrid computation model is not only able to achieve reliable MapReduce computation, but also reduces the computation cost, hence being a potential effective computation model.

Key words: hybrid computing environment;MapReduce;volunteer computing;fault-tolerance;distributed file system