非结构有限体积CFD计算的网格重排序优化

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (10): 1721-1729.

非结构有限体积CFD计算的网格重排序优化

张勇1，张曦2，万云博1，何先耀1，赵钟1，卢宇彤2

(1.中国空气动力研究与发展中心计算空气动力研究所，四川绵阳 621000;
2.中山大学计算机学院，广东广州 510006)

收稿日期:2022-03-14 修回日期:2022-05-25 出版日期:2022-10-25 发布日期:2022-10-28
基金资助:
国家数值风洞工程项目（NNW2019ZT6-B18）；广东省引进创新创业团队项目（2016ZT06D211）

Optimizations of mesh renumbering for unstructured finite-volume computational fluid dynamics

ZHANG Yong1 ，ZHANG Xi2 ，WAN Yun-bo1 ，HE Xian-yao1 ，ZHAO Zhong1 ，LU Yu-tong2

(1.Computational Aerodynamics Institute,China Aerodynamics Research and Development Center,Mianyang 621000;
2.School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China)

Received:2022-03-14 Revised:2022-05-25 Online:2022-10-25 Published:2022-10-28

摘要/Abstract

摘要： 网格重排序是提升流体力学CPU和GPU并行计算效率的重要手段之一。对于非结构网格，由于其数据存储无规律，数据的间接访问会导致访存延迟，尤其是在GPU并行计算时，数据的间接访问将引起内存的非对齐访问，放大了访存延迟的影响。对此，采用Reverse Cuthill-Mckee网格重排序方法优化了非结构网格的数据局部性，并设计了一种面向编号重排序方法。算例测试表明，网格重排序不影响最终计算结果。对比分析了网格重排序对非结构求解器在CPU和GPU上的性能影响：对CPU计算，可以使部分热点函数运行时间降低约20%，整体运行时间降低15%~20%；对GPU计算，大部分热点函数运行时间可降低35%~60%，程序整体运行时间降低约40%。

关键词: 非结构网格, 网格重排序, GPU并行计算, 计算流体力学, 风雷软件

Abstract: Mesh renumbering or reordering is one of the important means to improve the CPU and GPU parallel computing efficiency of Computational Fluid Dynamics (CFD). For unstructured meshes, due to the irregular data storage, indirect data access will lead to large memory access delays. Especially in GPU parallel computing, indirect data access will cause non-aligned memory access, amplifying the impact of memory access latency. In this regard, the Reverse Cuthill-Mckee mesh reordering method is used to optimize the data locality of unstructured meshes, and a face renumbering method is designed. The example test shows that the mesh reordering does not affect the final calculation result. The impact of mesh reordering on the performance of unstructured solvers on CPU and GPU is compared and analyzed. For CPU computing, the running time of some hotspot functions can be reduced by about 20%, and the overall running time can be reduced by 15%~20%. For GPU computing, the running time of most hotspot functions is reduced by 35%~60%, and the overall running time of the program is reduced by about 40%.

Key words:

unstructured mesh, mesh renumbering, GPU parallel computing, computational fluid dynamics(CFD), PHengLEI

中图分类号:

张勇, 张曦, 万云博, 何先耀, 赵钟, 卢宇彤. 非结构有限体积CFD计算的网格重排序优化[J]. 计算机工程与科学, 2022, 44(10): 1721-1729.

ZHANG Yong , ZHANG Xi , WAN Yun-bo , HE Xian-yao , ZHAO Zhong , LU Yu-tong. Optimizations of mesh renumbering for unstructured finite-volume computational fluid dynamics[J]. Computer Engineering & Science, 2022, 44(10): 1721-1729.

[1]	毛润泽, 吴子恒, 徐嘉阳, 章严, 陈帜, . DeepFlame：基于深度学习和高性能计算的反应流模拟开源平台[J]. 计算机工程与科学, 2024, 46(11): 1901-1907.
[2]	文敏华, 陈江, 胡广超, 韦建文, 王一超, 林新华. 面向CFD应用的Intel持久内存性能评估[J]. 计算机工程与科学, 2022, 44(09): 1550-1556.
[3]	龚昊, 刘莹, 冯建周, 赵仁良, 冷佳旭, . 基于GPU加速的脉冲多普勒雷达信号处理[J]. 计算机工程与科学, 2021, 43(07): 1141-1149.
[4]	王巍, 车永刚, 徐传福, 王正华. 基于OPS的计算流体力学软件多平台自动并行[J]. 计算机工程与科学, 2021, 43(05): 773-781.
[5]	刘杰, 龚春叶, 杨博, 郭晓威, 甘新标, 李胜国, 李超, 陈旭光, 肖调杰, 穆利安, 宋敏, 赵冬勇, 鞠羽中. YH-ACT：热工流体力学并行应用程序[J]. 计算机工程与科学, 2021, 43(01): 58-69.
[6]	徐传福, 车永刚, 李大力, 王勇献, 王正华. 天河超级计算机上超大规模高精度计算流体力学并行计算研究进展[J]. 计算机工程与科学, 2020, 42(10高性能专刊): 1815-1826.
[7]	潘沙，李桦，夏智勋. 高性能并行计算在航空航天CFD数值模拟中的应用[J]. J4, 2012, 34(8): 191-198.