• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (03): 390-395.

• 高性能计算 • 上一篇    下一篇

AI服务器PCIe拓扑应用研究

林楷智1,2,宗艳艳1,2,孙珑玲1,2   

  1. (1.高效能服务器和存储技术国家重点实验室,北京 100085;2.浪潮(北京)电子信息产业股份有限公司,北京 100085)

  • 收稿日期:2021-02-22 修回日期:2021-06-18 接受日期:2022-03-25 出版日期:2022-03-25 发布日期:2022-03-24
  • 基金资助:
    国家重点研发计划(2017YFB1001700)

Research on PCIe topology application of AI server

LIN Kai-zhi1,2,ZONG Yan-yan1,2,SUN Long-ling1,2   

  1. (1.State Key Laboratory of High-end Server & Storage Technology,Beijing 100085;
    2.Inspur Electronic Information Industry Co.,Ltd.,Beijing 100085,China)
  • Received:2021-02-22 Revised:2021-06-18 Accepted:2022-03-25 Online:2022-03-25 Published:2022-03-24

摘要: CPU+GPU的架构设计广泛应用于AI服务器,以实现大数据、云计算和人工智能等领域的数据收集和处理,常用的CPU+GPU PCIe拓扑结构有Balance Mode、Common Mode和Cascade Mode 3种。结合实际需求,复杂多样的应用场景需要对各种拓扑结构的适用性进行研究。首先简要介绍3种拓扑结构;然后设计实验,通过点对点带宽与延迟、双精度浮点运算性能和深度学习推理性能测试深入分析3种拓扑的适用性,为AI服务器在实际应用中的PCIe拓扑选择提供指导。

关键词: AI服务器, PCIe拓扑, 应用场景

Abstract: The CPU+GPU architecture design is widely used in the data collection and processing requirements of AI servers for big data, cloud computing, artificial intelligence and other fields. The commonly used CPU+GPU PCIe topologies include Balance Mode, Common Mode, and Cascade Mode. In combination with practical requirements, the applicability of various topologies should be studied for complex and diverse application scenarios. Firstly, the architecture of the three topologies is briefly introduced, and then the experiment is designed. The applicability of the three topologies is analyzed through the performance of the point-to-point bandwidth and delay, deep learning and double precision floating point operation. At last, the guidance for the PCIe topology selection of AI server is provided in practical application.


Key words: AI server, PCIe topology, application scenario