• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (10): 1721-1729.

• 高性能计算 • 上一篇    下一篇

非结构有限体积CFD计算的网格重排序优化

张勇1,张曦2,万云博1,何先耀1,赵钟1,卢宇彤2   

  1. (1.中国空气动力研究与发展中心计算空气动力研究所,四川 绵阳 621000;
    2.中山大学计算机学院,广东 广州 510006)
  • 收稿日期:2022-03-14 修回日期:2022-05-25 接受日期:2022-10-25 出版日期:2022-10-25 发布日期:2022-10-28
  • 基金资助:
    国家数值风洞工程项目(NNW2019ZT6-B18);广东省引进创新创业团队项目(2016ZT06D211)

Optimizations of mesh renumbering for unstructured finite-volume computational fluid dynamics

ZHANG Yong1 ,ZHANG Xi2 ,WAN Yun-bo1 ,HE Xian-yao1 ,ZHAO Zhong1 ,LU Yu-tong2   

  1. (1.Computational Aerodynamics Institute,China Aerodynamics Research and Development Center,Mianyang 621000;
    2.School of Computer Science and Engineering,Sun Yat-sen University,Guangzhou 510006,China)
  • Received:2022-03-14 Revised:2022-05-25 Accepted:2022-10-25 Online:2022-10-25 Published:2022-10-28

摘要: 网格重排序是提升流体力学CPU和GPU并行计算效率的重要手段之一。对于非结构网格,由于其数据存储无规律,数据的间接访问会导致访存延迟,尤其是在GPU并行计算时,数据的间接访问将引起内存的非对齐访问,放大了访存延迟的影响。对此,采用Reverse Cuthill-Mckee网格重排序方法优化了非结构网格的数据局部性,并设计了一种面向编号重排序方法。算例测试表明,网格重排序不影响最终计算结果。对比分析了网格重排序对非结构求解器在CPU和GPU上的性能影响:对CPU计算,可以使部分热点函数运行时间降低约20%,整体运行时间降低15%~20%;对GPU计算,大部分热点函数运行时间可降低35%~60%,程序整体运行时间降低约40%。

关键词: 非结构网格, 网格重排序, GPU并行计算, 计算流体力学, 风雷软件

Abstract: Mesh renumbering or reordering is one of the important means to improve the CPU and GPU parallel computing efficiency of Computational Fluid Dynamics (CFD). For unstructured meshes, due to the irregular data storage, indirect data access will lead to large memory access delays. Especially in GPU parallel computing, indirect data access will cause non-aligned memory access, amplifying the impact of memory access latency. In this regard, the Reverse Cuthill-Mckee mesh reordering method is used to optimize the data locality of unstructured meshes, and a face renumbering method is designed. The example test shows that the mesh reordering does not affect the final calculation result. The impact of mesh reordering on the performance of unstructured solvers on CPU and GPU is compared and analyzed. For CPU computing, the running time of some hotspot functions can be reduced by about 20%, and the overall running time can be reduced by 15%~20%. For GPU computing, the running time of most hotspot functions is reduced by 35%~60%, and the overall running time of the program is reduced by about 40%.

Key words:

中图分类号: