• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (02): 207-212.

• 论文 • 上一篇    下一篇

压缩对Hadoop性能影响研究

向丽辉,缪力,张大方   

  1. (湖南大学信息科学与工程学院,湖南 长沙 410086)
  • 收稿日期:2013-10-10 修回日期:2013-12-03 出版日期:2015-02-25 发布日期:2015-02-25
  • 基金资助:

    国家973计划资助项目(2012CB315805);国家自然科学基金资助项目(61173167)

Effect of compression on Hadoop:A case study of
improving I/O performance on Hadoop   

XIANG Lihui,MIAO  Li,ZHANG Dafang   

  1. (College of Computer Science and Electronic Engineering,Hunan University,Changsha 410086,China)
  • Received:2013-10-10 Revised:2013-12-03 Online:2015-02-25 Published:2015-02-25

摘要:

当今,磁盘I/O的发展速度永远赶不上遵照摩尔定律的CPU的发展速度,并且网络I/O资源稀缺,所以I/O常常成为数据处理的瓶颈。Hadoop能存储PB级数据,因此I/O问题愈加显著。压缩是I/O调优的一个重要方法,它能减少I/O的负载,加快磁盘和网络上的数据传输。首先通过分析Hadoop中各压缩算法的特点,得出一个压缩使用策略来帮助Hadoop的使用者确定如何使用压缩,并用实验得以验证补充。基于该策略,一些Hadoop应用在合理使用压缩后,效率能提高65%。

关键词: Hadoop, MapReduce, I/O, 压缩

Abstract:

Nowadays, the development of disk I/O never catches up with CPU according to the Moore’s law, and Network I/O is scarce, so I/O often becomes a bottleneck of data processing. Hadoop can store PBlevel data where I/O problem becomes more obvious. Compression is an important method to optimize I/O, which can reduce I/O load and speed up data transmission on disk and network. In Hadoop, the benefits of using compression have not been completely exploited. In this paper we first analyze the compression algorithms supported by Hadoop, then propose a strategy to help Hadoop users identify how to use compression and how to verify through experiment. By using compression, the performance of some Hadoop applications can be improved up to 65%.

Key words: Hadoop;MapReduce;I/O;compression