• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (04): 571-578.

• 论文 •     Next Articles

An analytical model and its applications for
minimizing total makespan of multiple MapReduce jobs                 

TIAN Wenhong1,2,CHEN  Yu2,WANG Xinyang2,XUE Ruini2,ZHAO Yong2   

  1. (1.School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu 610054;
    2.School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)
  • Received:2013-07-10 Revised:2013-09-08 Online:2014-04-25 Published:2014-04-25

Abstract:

As large-scale MapReduce clusters become widely adapted to process huge amount of data, one of critical challenges is to improve the service quality of MapReduce clusters by minimizing their makespan. A scheduling model can be considered for multiple MapReduce jobs. It is observed that the order in which these jobs are executed can have a significant impact on their overall makespan. The goal of the paper is to design a framework of automatic job scheduler and propose an analytical model for minimizing the makespan of such a set of MapReduce jobs. By considering a better strategy and implementation, we can meet the conditions of the classical Johnson algorithm and use it to find the optimal solution. Under our proposed new strategy, solving the balanced pools problem becomes exact in linear time, better than existing simulating approaches. Our proposed analytical results can be applied to improve system response time, energyefficiency and load-balance in Hadoop cluster pools, while corresponding numerical examples validate our observations.

Key words: Hadoop;MapReduce;batch workloads;optimized schedule;minimized makespan