• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (10): 58-64.

• 论文 • 上一篇    下一篇

基于Redis的海量小文件分布式存储方法研究

刘高军,王帝澳   

  1. (北方工业大学信息工程学院, 北京 100144)
  • 收稿日期:2013-03-05 修回日期:2013-09-15 出版日期:2013-10-25 发布日期:2013-10-25
  • 基金资助:

    国家科技部支撑计划课题基金(2012BAH04F01);科技创新平台(PXM2013_014212_000011)

Research of Redisbased distributed
storage method for massive small files 

LIU Gaojun,WANG Diao   

  1.  (School of Information Engineering,North China University of Technology,Beijing 100144,China)
  • Received:2013-03-05 Revised:2013-09-15 Online:2013-10-25 Published:2013-10-25

摘要:

小文件作为信息传输、存储的重要方式,使用相当广泛,用户对其可靠性和速度的要求也在不断提高。针对目前小文件存储效率较低的问题,首先结合分布式存储系统HDFS的大文件存储优势和Redis缓存技术,提出快速合并小文件的存储方案。把小文件合并为Sequence File存储到HDFS上,采用多元线性回归分析确定负载系数进行负载均衡调节,并在获取文件时使用缓存保证效率。在实验上,搭建相应的文件平台,分别对上传、获取、删除以及内存占用和传统直接上传的方式进行对比分析。可以看出,与传统的直接上传文件到HDFS的方式相比,经过改进的小文件处理方式可以在保证文件可靠性的同时,更快速地处理小文件。

关键词: HDFS;小文件;文件缓存;分布式文件系统

Abstract:

As an important way of information transmission and storage, small file has been widely used in many fields. Meanwhile, its reliability and speed requirements need to be improved. For the inefficiency of small file storage, combining the advantage of big file storage of distributed storage system HDFS and the Redis cache technology, we propose a fast small file merging scheme. Small files are merged to Sequence File, which is then stored in HDFS. Loads are balanced by load coefficients that are determined by multiple linear regression analysis, and the efficiency of file access is guaranteed by cache. In experiments, the corresponding file platform is constructed to analyze and compare upload, access, delete, and memory footprint with the traditional direct upload. We can see that, compared with the traditional way of uploading files to HDFS, the improved small files treatment can ensure the reliability of files and enables users operations on small files faster.

Key words: HDFS;small file;file cache;distributed file system