• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (11): 1901-1909.

• 高性能计算 • 上一篇    下一篇

GROMACS 2020在ROCm平台上的移植与优化

张驭洲1,曹武迪1,卜景德1,谭光明2,吉青1   

  1. (1.中国科学院理论物理研究所理论物理先进计算联合实验室,北京 100190;

    2.中国科学院计算技术研究所计算机体系结构国家重点实验室, 北京 100190)

  • 收稿日期:2020-09-18 修回日期:2020-12-02 接受日期:2021-11-25 出版日期:2021-11-25 发布日期:2021-11-19
  • 基金资助:
    国家重点研发计划(2018YFB0204400) 

Porting and optimization of GROMACS 2020 on ROCm platform

ZHANG Yu-zhou1,CAO Wu-di1,BU Jing-de1,TAN Guang-ming2,JI Qing1   

  1. (1.Joint Laboratory of Advanced Computing for Theoretical Physics,
    Institute of Theoretical Physics,Chinese Academy of Sciences,Beijing 100190;

    2.State Key Laboratory of Computer Architecture,Institute of Computing Technology,
    Chinese Academy of Sciences,Beijing 100190,China)

  • Received:2020-09-18 Revised:2020-12-02 Accepted:2021-11-25 Online:2021-11-25 Published:2021-11-19

摘要: GROMACS是应用广泛的开源分子动力学模拟软件,当前主要通过CUDA使用NVIDIA GPU进行加速计算。ROCm是一个开源的高性能异构计算平台。基于ROCm平台的HIP编程语言,首次实现了GROMACS 2020系列在ROCm平台上的完整移植。在MI50 GPU上,以一个复杂离子液体模拟算例为目标,使用GPU性能分析工具rocprof对移植代码进行了性能分析。针对MI50硬件特性,先后对成键力核函数、静电力的PME核函数和短程非成键力核函数进行了优化,优化后运行目标算例的性能相比初始版本整体上获得了约2.8倍的加速比,在 MI50上的性能高于GROMACS原版OpenCL代码60.5%,相对纯CPU版本有约2.7倍的加速比。在另外2个具有代表性算例的单结点测试以及离子液体算例的多结点扩展性测试中,优化后的代码也达到了较好的性能提升,这表明所采用的优化操作具有一定的通用性。

关键词: 分子动力学;GROMACS, ROCm, 应用移植, 性能优化

Abstract: GROMACS is a widely used open-source molecular dynamics simulation software. Currently, NVIDIA GPUs are mainly used for accelerated calculations through CUDA. ROCm is an open-source high-performance heterogeneous computing platform. Based on the HIP programming language of the ROCm platform, this paper implements the complete porting of the GROMACS 2020 series on the ROCm platform for the first time. On MI50 GPU, with a complex ionic liquid simulation example as the target, the performance analysis of the transplanted code was carried out using GPU performance analysis tool rocprof. According to the hardware characteristics of MI50, the bonding force kernel function, the PME kernel function of electrostatic force and the short-range non-bonding force kernel function are optimized successively. After optimization, the performance of the target calculation example is about 2.8 times that of the initial version. The performance on MI50 is 60.5% higher than that of the GROMACS original OpenCL code, which is about 2.7 times faster than the pure CPU version. In the single-node test of the other two representative examples and the multi-node scalability test of the ionic liquid example, the optimized code also achieves a better performance improvement, which shows that the optimization has a certain versatility. 

Key words: molecular dynamics, GROMACS, radeon open compute, application porting, performance optimization