• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

集成I/O硬件压缩加速器的Hadoop系统结构

雷力1,钱斌海1,郭俊1,顾雄礼2,刘鹏1   

  1. (1.浙江大学信息与电子工程学院,浙江 杭州 310027;2.华为技术有限公司,浙江  杭州 310051)
  • 收稿日期:2016-04-20 修回日期:2016-06-15 出版日期:2016-08-25 发布日期:2016-08-25
  • 基金资助:

    华为技术有限公司资助项目(YB2014100047)

Integrated I/O hardware compression accelerators of Hadoop system architecture   

LEI Li1,QIAN Binhai1,GUO Jun1,GU Xiongli2,LIU Peng1   

  1. (1.College of Information Science & Electronic Engineering,Zhejiang University,Hangzhou 310027;
    2.Huawei Technologies Co.,Ltd.,Hangzhou  310051,China)
  • Received:2016-04-20 Revised:2016-06-15 Online:2016-08-25 Published:2016-08-25

摘要:

随着大数据的发展,Hadoop系统成为了大数据处理中的重要工具之一。在实际应用中,Hadoop的I/O操作制约系统性能的提升。通常Hadoop系统通过软件压缩数据来减少I/O操作,但是软件压缩速度较慢,因此使用硬件压缩加速器来替换软件压缩。Hadoop运行在Java虚拟机上,无法直接调用底层I/O硬件压缩加速器。通过实现Hadoop压缩器/解压缩器类和设计C++动态链接库来解决从Hadoop系统中获得压缩数据和将数据流向I/O硬件压缩加速器两个关键技术,从而将I/O硬件压缩加速器集成到Hadoop系统框架。实验结果表明,I/O硬件压缩加速器的每赫兹压缩速度为15.9 Byte/s/Hz,集成I/O硬件压缩加速器提升Hadoop系统性能2倍。

关键词: Hadoop, I/O, 硬件压缩加速器

Abstract:

With the development of  big data, Hadoop systems become an important tool, but I/O operations impede their performance improvement  in practical applications. Hadoop usually decreases its’ I/O operations by using software to compress data. However, data compression by software is slower than hardware accelerators. When Hadoop runs on Java virtual machines, it cannot directly call I/O hardware accelerators. To avoid getting data from the Hadoop system and transferring the data to I/O hardware accelerators, a compressor and decompressor class of Hadoop and a C++ dynamic linking library are employed in the Hadoop system. Experimental results show that both techniques can integrate I/O hardware accelerators into the Hadoop system frame work, the efficiency of I/O hardware compressor is 15.9Byte/s/Hz, and the performance of the Hadoop system can be improved by two times.

Key words: Hadoop, I/O, hardware compression accelerator