一种基于NVMeoF存储池的分域共享并发存储架构

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (10高性能专刊): 1711-1719.

• 高性能计算机体系结构 • 上一篇下一篇

一种基于NVMeoF存储池的分域共享并发存储架构

李琼，宋振龙，袁远，谢徐超

（国防科技大学计算机学院,湖南长沙 410073）

收稿日期:2020-06-11 修回日期:2020-07-12 出版日期:2020-10-25 发布日期:2020-10-23
基金资助:
国家重点研发计划(2018YFB0204301)

A regional shared and high concurrent storage architecture based on NVMeoF storage pool

LI Qiong,SONG Zhen-long,YUAN Yuan,XIE Xu-chao

（School of Computer,National University of Defense Technology,Changsha 410073,China）

Received:2020-06-11 Revised:2020-07-12 Online:2020-10-25 Published:2020-10-23

摘要/Abstract

摘要： E级计算和大数据时代，为了充分利用超级计算机系统的并行计算能力，许多大数据应用程序在高性能计算HPC系统上运行，超级计算机的I/O模式更趋复杂，I/O瓶颈问题日益严峻。当前基于闪存的存储阵列或存储服务器已逐步应用在高性能计算机的并行存储系统中，但传统存储体系结构、I/O协议软件栈和存储网络的较高延迟使得新型存储介质不能发挥性能优势，存储系统依然存在I/O访问延迟高、并发I/O吞吐率和瞬发I/O（Burst I/O）带宽受限的问题。针对上述问题和技术挑战，提出了一种基于非易失存储介质
NVM的分域共享并发存储架构，设计了一种支持NVMeoF网络存储的Burst I/O缓冲存储池NV-BSP，实现了虚拟化存储池资源管理、基于天河高速互连网的NVMeoF网络存储通信等关键技术，具有横向和纵向扩展能力，可有效支持面向特定计算任务的Burst I/O加速和低延迟远程存储访问。基于HPC和大数据应用程序混合运行性能分析模型，提出了一种混合应用程序QoS控制策略。小规模验证系统上的性能测评结果表明：NV-BSP存储池的读写性能可随并发I/O处理线程数良好扩展；与Linux 操作系统自带的MD-RAID相比具有明显的性能优势；相比本地I/O访问，基于天河互连网络的NVMeoF远程存储读写延迟仅增加了59.25 μs和54.03 μs。通过计算与存储分离，NV-BSP 在提供堪比本地存储池性能的同时，提高了系统存储资源动态调配的灵活性和系统可靠性。

关键词: 存储系统结构, Burst Buffer, NVMe SSD, NVMeoF, 高性能计算, 大数据

Abstract:

In the era of exascale computing and big data, High Performance Computing (HPC) systems have been widely deployed as the infrastructure for big data analytics, in order to leverage their parallel computing capabilities. As the I/O patterns in HPC systems get increasingly complicated and heterogeneous, breaking through the I/O bottleneck is challenging and urgent for HPC systems. In recent years, flash-based storage arrays and storage servers have been gradually deployed in HPC storage systems. However, the conventional shared storage architectures, I/O software stack, and storage networking designs are primarily for Hard Disk Drives (HDD), which induces severe I/O overhead in the I/O path and prevents the HPC storage systems from taking full advantage of the performance benefits from Non-Volatile Memory (NVM). To achieve low I/O latency, high concurrent I/O throughput, and high burst I/O bandwidth, this paper proposes a regional shared and high concurrent storage architecture. We design an NVMeoF-based burst I/O storage pool (NV-BSP), which implements the key techniques such as virtualized storage pool resource management and NVeoF network storage communication based on Tianhe high-speed Internet. It has horizontal and vertical expansion capabilities and can effectively support Burst I/O acceleration and low-latency remote for specific computing tasks. Besides, we further propose a Quality-of-Service (QoS) control strategy for the storage systems with HPC and big data mixed applications. The experimental results on a prototype system show that NV-BSP achieves the scalable write performance as the number of I/O handling threads increases. Compared with the built-in MD-RAID in Linux, NV-BSP obtains higher I/O bandwidth. Compared with the node-local storage pool, I/O latencies of NVMeoF-based remote storage only increase 59.25us for read and 54.03us for write. By disaggregating storage from computation, NV-BSP significantly improves the system scalability and reliability while delivering the comparable performance to local storage.

Key words: storage architecture, burst buffer, NVMe SSD, NVMe over fabrics, high performance computing, big data

李琼, 宋振龙, 袁远, 谢徐超. 一种基于NVMeoF存储池的分域共享并发存储架构[J]. 计算机工程与科学, 2020, 42(10高性能专刊): 1711-1719.

LI Qiong, SONG Zhen-long, YUAN Yuan, XIE Xu-chao. A regional shared and high concurrent storage architecture based on NVMeoF storage pool[J]. Computer Engineering & Science, 2020, 42(10高性能专刊): 1711-1719.

[1]	李俊哲, 付振新, 杨宏辉, 马银萍, 李若淼, 樊春, . 面向算力网络的跨集群数据迁移系统的设计和实现[J]. 计算机工程与科学, 2025, 47(05): 775-786.
[2]	贾春波, 陈光, 姚信安, 李宝峰. 基于国产元器件的大功率多相供电技术研究[J]. 计算机工程与科学, 2025, 47(04): 592-600.
[3]	张建民, 许炜康, 刘津津, 黎铁军. 粒子输运非确定性模拟的加速方法研究进展[J]. 计算机工程与科学, 2025, 47(01): 1-9.
[4]	张云泉, 邓力, 袁良, 袁国兴. 2024年中国高性能计算机发展现状分析[J]. 计算机工程与科学, 2024, 46(12): 2091-2098.
[5]	孙岩, 张建民, 黎渊, 孙舜禹. 面向高性能计算的互连网络拥塞控制分析与评估[J]. 计算机工程与科学, 2024, 46(02): 209-216.
[6]	张云泉, 邓力, 袁良, 袁国兴. 2023年中国高性能计算机发展现状分析[J]. 计算机工程与科学, 2023, 45(12): 2091-2098.
[7]	施得君, 李宏亮, 胡舒凯. 基于Clos网络的高阶路由器结构[J]. 计算机工程与科学, 2023, 45(12): 2099-2112.
[8]	张天阳, 池成悦, 郭武, 高亦沁, 文敏华, 韦建文. 校级异地超算集群管理的关键技术研究与实践[J]. 计算机工程与科学, 2023, 45(12): 2135-2145.
[9]	肖调杰, 周峰, 郑翾宇, 刘剑, 陈琳, 刘杰, 易明宽, 陈旭光, 龚春叶, 杨博, 甘新标, 李胜国, 左克, . 大规模三维频率域电磁积分方程法数值模拟[J]. 计算机工程与科学, 2023, 45(11): 1901-1910.
[10]	朱文龙, 江嘉治, 黄聃, 肖侬. ParM:基于国产处理器的异构并行编程模型[J]. 计算机工程与科学, 2023, 45(09): 1521-1531.
[11]	吴铁彬, 过锋, 王谛. 面向E级计算的高性能处理器核心运算架构研究进展[J]. 计算机工程与科学, 2023, 45(05): 761-771.
[12]	陈奉贤. 基于NR-Transformer的集群作业运行时间预测[J]. 计算机工程与科学, 2022, 44(07): 1181-1190.
[13]	曹继军. 面向HPC和DC的可重构光互连网络体系结构综述[J]. 计算机工程与科学, 2022, 44(06): 951-963.
[14]	袁国兴, 张云泉, 袁良. 2021年中国高性能计算机发展现状分析[J]. 计算机工程与科学, 2021, 43(12): 2091-2097.
[15]	袁远, 李世杰, 邢建英, 蒋句平. E级高性能计算机系统中监控分系统的挑战与设计[J]. 计算机工程与科学, 2021, 43(08): 1366-1375.

一种基于NVMeoF存储池的分域共享并发存储架构

A regional shared and high concurrent storage architecture based on NVMeoF storage pool

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 15

编辑推荐

Metrics

本文评价