GROMACS 2020在ROCm平台上的移植与优化

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (11): 1901-1909.

GROMACS 2020在ROCm平台上的移植与优化

张驭洲1，曹武迪1，卜景德1，谭光明2，吉青1

（1.中国科学院理论物理研究所理论物理先进计算联合实验室，北京 100190；

2.中国科学院计算技术研究所计算机体系结构国家重点实验室，北京 100190）

收稿日期:2020-09-18 修回日期:2020-12-02 出版日期:2021-11-25 发布日期:2021-11-19
基金资助:
国家重点研发计划（2018YFB0204400）

Porting and optimization of GROMACS 2020 on ROCm platform

ZHANG Yu-zhou1,CAO Wu-di1,BU Jing-de1,TAN Guang-ming2,JI Qing1

（1.Joint Laboratory of Advanced Computing for Theoretical Physics,
Institute of Theoretical Physics,Chinese Academy of Sciences,Beijing 100190;

2.State Key Laboratory of Computer Architecture,Institute of Computing Technology,
Chinese Academy of Sciences,Beijing 100190,China）

Received:2020-09-18 Revised:2020-12-02 Online:2021-11-25 Published:2021-11-19

摘要/Abstract

摘要： GROMACS是应用广泛的开源分子动力学模拟软件，当前主要通过CUDA使用NVIDIA GPU进行加速计算。ROCm是一个开源的高性能异构计算平台。基于ROCm平台的HIP编程语言，首次实现了GROMACS 2020系列在ROCm平台上的完整移植。在MI50 GPU上，以一个复杂离子液体模拟算例为目标，使用GPU性能分析工具rocprof对移植代码进行了性能分析。针对MI50硬件特性，先后对成键力核函数、静电力的PME核函数和短程非成键力核函数进行了优化，优化后运行目标算例的性能相比初始版本整体上获得了约2.8倍的加速比，在 MI50上的性能高于GROMACS原版OpenCL代码60.5%，相对纯CPU版本有约2.7倍的加速比。在另外2个具有代表性算例的单结点测试以及离子液体算例的多结点扩展性测试中，优化后的代码也达到了较好的性能提升，这表明所采用的优化操作具有一定的通用性。

关键词: 分子动力学；GROMACS, ROCm, 应用移植, 性能优化

Abstract: GROMACS is a widely used open-source molecular dynamics simulation software. Currently, NVIDIA GPUs are mainly used for accelerated calculations through CUDA. ROCm is an open-source high-performance heterogeneous computing platform. Based on the HIP programming language of the ROCm platform, this paper implements the complete porting of the GROMACS 2020 series on the ROCm platform for the first time. On MI50 GPU, with a complex ionic liquid simulation example as the target, the performance analysis of the transplanted code was carried out using GPU performance analysis tool rocprof. According to the hardware characteristics of MI50, the bonding force kernel function, the PME kernel function of electrostatic force and the short-range non-bonding force kernel function are optimized successively. After optimization, the performance of the target calculation example is about 2.8 times that of the initial version. The performance on MI50 is 60.5% higher than that of the GROMACS original OpenCL code, which is about 2.7 times faster than the pure CPU version. In the single-node test of the other two representative examples and the multi-node scalability test of the ionic liquid example, the optimized code also achieves a better performance improvement, which shows that the optimization has a certain versatility.

Key words: molecular dynamics, GROMACS, radeon open compute, application porting, performance optimization

张驭洲, 曹武迪, 卜景德, 谭光明, 吉青. GROMACS 2020在ROCm平台上的移植与优化[J]. 计算机工程与科学, 2021, 43(11): 1901-1909.

ZHANG Yu-zhou, CAO Wu-di, BU Jing-de, TAN Guang-ming, JI Qing. Porting and optimization of GROMACS 2020 on ROCm platform[J]. Computer Engineering & Science, 2021, 43(11): 1901-1909.

[1]	陈文锦. QTorch:基于独立的量子程序设计语言的量子-经典混合机器学习框架[J]. 计算机工程与科学, 2025, 47(03): 412-421.
[2]	施禹, 董攀, 张利军. 一种不规则稀疏矩阵的SpMV方法[J]. 计算机工程与科学, 2024, 46(07): 1175-1184.
[3]	李飞, 郭绍忠, 周蓓, 宋广辉, 郝江伟, 许瑾晨. RISC-V基础数学库性能优化[J]. 计算机工程与科学, 2023, 45(09): 1532-1543.
[4]	康宇晗, 时洋, 陈照云, 文梅. 面向迈创+MatrixZone异构系统的深度学习编程框架[J]. 计算机工程与科学, 2023, 45(07): 1149-1158.
[5]	莫舒恒, 卢圣有, 黄聃, 卢宇彤. 基于即时编译的GNU Octave性能优化[J]. 计算机工程与科学, 2022, 44(12): 2091-2101.
[6]	沈佳杰, 卢修文, 向望, 赵泽宇, 王新, . 分布式存储系统读写一致性算法性能优化研究综述[J]. 计算机工程与科学, 2022, 44(04): 571-583.
[7]	周静, 关玉蓉. 基于SDN的DWSN技术分析及性能优化研究[J]. 计算机工程与科学, 2021, 43(08): 1413-1421.
[8]	朱良杰, 沈佳杰, 周扬帆, 王新, . 云际存储系统性能优化研究现状与展望[J]. 计算机工程与科学, 2021, 43(05): 761-772.
[9]	徐海坤, 匡邓晖, 刘杰, 龚春叶, . 基于RMC的蒙特卡罗程序性能优化[J]. 计算机工程与科学, 2021, 43(04): 634-640.
[10]	王一超1,胡航1,William Tang2,王蓓2,林新华1. 使用GTC-P应用评估曙光E级原型机的性能[J]. 计算机工程与科学, 2020, 42(01): 1-7.
[11]	王武1，王舒扬1,2，姜金荣1,孟虹松3. 快速多极子方法在申威众核处理器上的实现和优化[J]. 计算机工程与科学, 2019, 41(07): 1161-1167.
[12]	曹立强,罗红兵. 并行科学计算应用中采样数据的聚集I/O[J]. 计算机工程与科学, 2018, 40(09): 1534-1539.
[13]	廖旺坚1,2,黄永峰1,2,包从开1,2. Spark并行计算框架的内存优化[J]. 计算机工程与科学, 2018, 40(04): 587-593.
[14]	李帅1，吴斌2，杜修明3，陈玉峰3. 基于Spark的BIRCH算法并行化的设计与实现[J]. 计算机工程与科学, 2017, 39(01): 35-41.
[15]	李鑫1, 3，郭晓威1，林宇斐2. 数据流Eager传输：一种分布式流体系结构中的性能优化技术[J]. J4, 2015, 37(11): 2035-2044.