• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (10高性能专刊): 1711-1719.

• 高性能计算机体系结构 • 上一篇    下一篇

一种基于NVMeoF存储池的分域共享并发存储架构

李琼,宋振龙,袁远,谢徐超   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2020-06-11 修回日期:2020-07-12 接受日期:2020-10-25 出版日期:2020-10-25 发布日期:2020-10-23
  • 基金资助:
    国家重点研发计划(2018YFB0204301)

A regional shared and high concurrent storage architecture based on NVMeoF storage pool

LI Qiong,SONG Zhen-long,YUAN Yuan,XIE Xu-chao   

  1. (School of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2020-06-11 Revised:2020-07-12 Accepted:2020-10-25 Online:2020-10-25 Published:2020-10-23

摘要: E级计算和大数据时代,为了充分利用超级计算机系统的并行计算能力,许多大数据应用程序在高性能计算HPC系统上运行,超级计算机的I/O模式更趋复杂,I/O瓶颈问题日益严峻。当前基于闪存的存储阵列或存储服务器已逐步应用在高性能计算机的并行存储系统中,但传统存储体系结构、I/O协议软件栈和存储网络的较高延迟使得新型存储介质不能发挥性能优势,存储系统依然存在I/O访问延迟高、并发I/O吞吐率和瞬发I/O(Burst I/O)带宽受限的问题。针对上述问题和技术挑战,提出了一种基于非易失存储介质
NVM的分域共享并发存储架构,设计了一种支持NVMeoF网络存储的Burst I/O缓冲存储池NV-BSP,实现了虚拟化存储池资源管理、基于天河高速互连网的NVMeoF网络存储通信等关键技术,具有横向和纵向扩展能力,可有效支持面向特定计算任务的Burst I/O加速和低延迟远程存储访问。基于HPC和大数据应用程序混合运行性能分析模型,提出了一种混合应用程序QoS控制策略。小规模验证系统上的性能测评结果表明:NV-BSP存储池的读写性能可随并发I/O处理线程数良好扩展;与Linux 操作系统自带的MD-RAID相比具有明显的性能优势;相比本地I/O访问,基于天河互连网络的NVMeoF远程存储读写延迟仅增加了59.25 μs和54.03 μs。通过计算与存储分离,NV-BSP 在提供堪比本地存储池性能的同时,提高了系统存储资源动态调配的灵活性和系统可靠性。


关键词: 存储系统结构, Burst Buffer, NVMe SSD, NVMeoF, 高性能计算, 大数据

Abstract:

In the era of exascale computing and big data, High Performance Computing (HPC) systems have been widely deployed as the infrastructure for big data analytics, in order to leverage their parallel computing capabilities. As the I/O patterns in HPC systems get increasingly complicated and heterogeneous, breaking through the I/O bottleneck is challenging and urgent for HPC systems. In recent years, flash-based storage arrays and storage servers have been gradually deployed in HPC storage systems. However, the conventional shared storage architectures, I/O software stack, and storage networking designs are primarily for Hard Disk Drives (HDD), which induces severe I/O overhead in the I/O path and prevents the HPC storage systems from taking full advantage of the performance benefits from Non-Volatile Memory (NVM). To achieve low I/O latency, high concurrent I/O throughput, and high burst I/O bandwidth, this paper proposes a regional shared and high concurrent storage architecture. We design an NVMeoF-based burst I/O storage pool (NV-BSP), which implements the key techniques such as virtualized storage pool resource management and NVeoF network storage communication based on Tianhe high-speed Internet. It has horizontal and vertical expansion capabilities and can effectively support Burst I/O acceleration and low-latency remote for specific computing tasks. Besides, we further propose a Quality-of-Service (QoS) control strategy for the storage systems with HPC and big data mixed applications. The experimental results on a prototype system show that NV-BSP achieves the scalable write performance as the number of I/O handling threads increases. Compared with the built-in MD-RAID in Linux, NV-BSP obtains higher I/O bandwidth. Compared with the node-local storage pool, I/O latencies of NVMeoF-based remote storage only increase 59.25us for read and 54.03us for write. By disaggregating storage from computation, NV-BSP significantly improves the system scalability and reliability while delivering the comparable performance to local storage.


Key words: storage architecture, burst buffer, NVMe SSD, NVMe over fabrics, high performance computing, big data