• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

MRI:面向并行迭代的MapReduce模型

马志强,张力,杨双涛   

  1. (内蒙古工业大学信息工程学院,内蒙古 呼和浩特 010080)
  • 收稿日期:2016-08-25 修回日期:2016-10-19 出版日期:2016-12-25 发布日期:2016-12-25
  • 基金资助:

    国家自然科学基金(61540004);内蒙古自治区自然科学基金(2014MS0608)

MRI:A MapReduce model for parallel iteration

MA Zhiqiang,ZHANG Li,YANG Shuangtao   

  1. (College of Information Engineering,Inner Mongolia University of Technology,Hohhot 010080,China)
  • Received:2016-08-25 Revised:2016-10-19 Online:2016-12-25 Published:2016-12-25

摘要:

机器学习领域内的多数模型均需要通过迭代计算以求解其最优参数,而MapReduce模型在迭代计算中的缺陷不足导致其在迭代计算中无法得到广泛应用。为解决上述矛盾,基于MapReduce模型提出并实现了一种可用于模型参数求解的并行迭代模型MRI。MRI模型在保持Map以及Reduce阶段的基础上,新增了Iterate阶段以及相关通信协议,实现了迭代过程中模型参数的更新、分发与迭代控制;通过对MapReduce状态机进行增强,实现了节点任务的重用,避免了迭代过程中节点任务重复创建、初始化以及回收带来的性能开销;在任务节点实现了数据缓存,保障了数据的本地性,并在Map节点增加了基于内存的块缓存机制,进一步提高训练集加载效率,以提高整体迭代效率。基于梯度下降算法的实验结果表明:MRI模型在并行迭代计算方面性能优于MapReduce模型。

关键词: MapReduce, 并行计算, 迭代计算, 机器学习

Abstract:

MapReduce models have not been widely used in iterative computation because of its defect in iterative computation. However, in order to get the optimal parameters, most of the algorithms in the field of machine learning need to be solved by iterative computation. We propose and implement a parallel iterative model based on the MapReduce for solving the optimal parameters.The MRI adds an iterate phase to the MapReduce to realize the update and distribution of parameters and the control of iteration during the iterative process. We then modify the MapReduce state machine to reuse the node tasks and avoid unnecessary performance overhead. In order to speed up the iterative process, the MRI also caches data block in the task nodes and implements the memory based block caching mechanism on the Map node. Experiment results on the gradient descent algorithm show that the performance of the proposed MRI model outperforms the MapReduce.

Key words: MapReduce, parallel computing, iterative computing, machine learning