• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

基于柯西码的HDFS存储优化策略

谢果君,沈记全,杨焕焕   

  1. (河南理工大学计算机科学与技术学院,河南 焦作 454000)
  • 收稿日期:2018-04-26 修回日期:2018-07-12 出版日期:2019-03-25 发布日期:2019-03-25
  • 基金资助:

    河南省基础与前沿研究项目(152300410212)

An HDFS storage optimization strategy based on Cauchy code  

XIE Guojun,SHEN Jiquan,YANG Huanhuan   

  1. (School of Computer Science and Technology,Henan Polytechnic University,Jiaozuo 454000,China)
  • Received:2018-04-26 Revised:2018-07-12 Online:2019-03-25 Published:2019-03-25

摘要:

随着大数据时代的到来,数据存储正接受着严峻的考验。为了改进传统Hadoop分布式文件系统HDFS存在的冗余度高、负载均衡能力不足等问题,提出了一种基于柯西码的动态分散式存储优化策略CDDS。对于系统中的数据块,在保证数据可用性的基础上,依据其热度的不同生成相应的存储方案。对于系统中的冷数据与热数据,分别采用基于柯西码的纠删码技术进行单副本与多副本存储,既保证了数据的可靠性又保证了系统的I/O能力。经测试,运用该策略存储数据所需要的存储空间减小为原来的75%,系统的可靠性与负载均衡能力也得到了增强。

关键词: 数据存储, 柯西码, 动态副本, 负载均衡

Abstract:

With the advent of the big data era, data storage is facing severe challenges. The traditional Hadoop distributed file system (HDFS) has problems such as high storage redundancy and insufficient load balancing. Aiming at these problems, based on Cauchy code, we propose a Cauchy dynamic decentralized storage (CDDS) strategy. For the data blocks in the system, this strategy can generate different storage schemes based on their heat levels while ensuring data availability. For the cold data and hot data in the system, we adopt the Cauchy based erasure code technology to perform singlecopy storage and multicopy storage respectively, which guarantees the reliability of the data and the I/O capability of the system. Test results show that the CDDS strategy reduces data storage space to 75% of the original, and enhances the system’s reliability and load balancing capability.
 

Key words: data storage, Cauchy code, dynamic replica, load balancing