• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (02): 209-216.

• 高性能计算 • 上一篇    下一篇

面向高性能计算的互连网络拥塞控制分析与评估

孙岩,张建民,黎渊,孙舜禹   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2023-09-06 修回日期:2023-10-27 接受日期:2024-02-25 出版日期:2024-02-25 发布日期:2024-02-24
  • 基金资助:
    国家重点研发计划(2022YFB2803405);国防科技重点实验室项目(WDZC20235250114)

Analysis and evaluation of congestion control in interconnection networks for high performance computing

SUN Yan,ZHANG Jian-min,LI Yuan,SUN Shun-yu#br#   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-09-06 Revised:2023-10-27 Accepted:2024-02-25 Online:2024-02-25 Published:2024-02-24

摘要: 随着高性能计算技术的发展,高性能计算系统的网络节点数量不断增长,高性能计算应用对网络性能的要求越来越高,高性能互连网络的拥塞控制面临很大的压力与挑战。针对高性能计算互连网络的特点,研究高效、低开销的拥塞控制方法,是确保高性能互连网络性能和稳定性的关键。针对高性能计算系统中互连通信的核心问题,对主流的拥塞控制方法进行分析和实验比较;基于高性能计算系统的结构特点和通信特性,设计用于大规模模拟仿真的数据流模型和流文件生成工具,并提出一种拥塞控制综合评价指标;使用所提出的数据流模型,在较大规模网络中对不同拥塞控制方法进行模拟,并基于所提出的评价指标对几种拥塞控制方法的性能进行分析和评估。提出的分析和评估技术可以对高性能互连网络的拥塞控制方法进行更客观和准确的分析与评估。

关键词: 高性能计算, 拥塞控制, 流量控制, RDMA网络

Abstract: With the development of high performance computing technology, the number of network nodes in high performance computing systems is continuously growing, and the requirements of high performance computing applications for network performance are becoming increasingly stringent. Therefore, congestion control for high performance interconnection networks faces great pressure and challenges. To address the characteristics of high performance computing interconnection networks, researching efficient and low-overhead congestion control methods is crucial to ensuring the performance and stability of high performance interconnection networks. This study focuses on the core issues of interconnection communication in high performance computing systems. It analyzes and compares the mainstream congestion control methods. Based on the structural characteristics and communication properties of high performance computing systems, it designs a data flow model and a flow file generation tool for large-scale simulation, and proposes a comprehensive evaluation index for congestion control. Using the proposed data flow model, different congestion control methods are simulated on a large-scale network, and their performance is analyzed and evaluated based on the proposed evaluation index. The analysis and evaluation techniques proposed in this study can provide more objective and accurate analysis and evaluation of congestion control methods for high performance interconnection networks.

Key words: high performance computing, congestion control, traffic control, RDMA network