• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (02): 207-212.

• 论文 • Previous Articles     Next Articles

Effect of compression on Hadoop:A case study of
improving I/O performance on Hadoop   

XIANG Lihui,MIAO  Li,ZHANG Dafang   

  1. (College of Computer Science and Electronic Engineering,Hunan University,Changsha 410086,China)
  • Received:2013-10-10 Revised:2013-12-03 Online:2015-02-25 Published:2015-02-25

Abstract:

Nowadays, the development of disk I/O never catches up with CPU according to the Moore’s law, and Network I/O is scarce, so I/O often becomes a bottleneck of data processing. Hadoop can store PBlevel data where I/O problem becomes more obvious. Compression is an important method to optimize I/O, which can reduce I/O load and speed up data transmission on disk and network. In Hadoop, the benefits of using compression have not been completely exploited. In this paper we first analyze the compression algorithms supported by Hadoop, then propose a strategy to help Hadoop users identify how to use compression and how to verify through experiment. By using compression, the performance of some Hadoop applications can be improved up to 65%.

Key words: Hadoop;MapReduce;I/O;compression