• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2013, Vol. 35 ›› Issue (10): 58-64.

• 论文 • Previous Articles     Next Articles

Research of Redisbased distributed
storage method for massive small files 

LIU Gaojun,WANG Diao   

  1.  (School of Information Engineering,North China University of Technology,Beijing 100144,China)
  • Received:2013-03-05 Revised:2013-09-15 Online:2013-10-25 Published:2013-10-25

Abstract:

As an important way of information transmission and storage, small file has been widely used in many fields. Meanwhile, its reliability and speed requirements need to be improved. For the inefficiency of small file storage, combining the advantage of big file storage of distributed storage system HDFS and the Redis cache technology, we propose a fast small file merging scheme. Small files are merged to Sequence File, which is then stored in HDFS. Loads are balanced by load coefficients that are determined by multiple linear regression analysis, and the efficiency of file access is guaranteed by cache. In experiments, the corresponding file platform is constructed to analyze and compare upload, access, delete, and memory footprint with the traditional direct upload. We can see that, compared with the traditional way of uploading files to HDFS, the improved small files treatment can ensure the reliability of files and enables users operations on small files faster.

Key words: HDFS;small file;file cache;distributed file system