• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (12): 2119-2128.

• 高性能计算 • 上一篇    下一篇

面向天河超算的OpenFOAM不可压缩流体模拟算法优化

刘忠民,张翔,马迪,孙扬,周磊,邱琪,龚春叶   

  1. (1.长沙理工大学计算机学院,湖南 长沙 410073;2.国防科技大学计算机学院,湖南 长沙 410073;
    3.32286部队,湖南 长沙 410073;4.国家超级计算天津中心,天津 300457)

  • 收稿日期:2024-04-29 修回日期:2024-09-24 出版日期:2025-12-25 发布日期:2026-01-06
  • 基金资助:
    国家自然科学基金(62032023,42104078,61902411)

Incompressible fluid simulation algorithm optimization of  OpenFOAM on Tianhe supercomputing

LIU Zhongmin,ZHANG Xiang,MA Di,SUN Yang,ZHOU Lei,QIU Qi,GONG Chunye   

  1. (1.School of Computer Science and Technology,Changsha University of Technology,Changsha 410073;
    2.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    3.32286 troops,Changsha 410073; 
    4.National Supercomputer Center in Tianjin,Tianjin 300457,China)
  • Received:2024-04-29 Revised:2024-09-24 Online:2025-12-25 Published:2026-01-06

摘要: 流体力学开源软件OpenFOAM中的不可压缩等流体模拟求解器具有跨平台适用性,但它们的性能优化大多是针对Intel等现有架构的超算系统,故其算法优化无法发挥天河超算系统上ARM架构的向量化并行优势。为此,以不可压缩流体模拟求解器为研究对象,运用ARM向量化技术来优化它的对称高斯赛德尔法和对角不完全Cholesky预条件共轭梯度法,提升求解器的运行效率。为实现向量化目标,分析了2类求解算法的一次迭代中近邻网格单元间的关系,发现这些近邻单元数目最多为2,且近邻之间无依赖。利用该先验信息,以尽可能最小的成本改动原有算法代码,即新增4行if-else条件语句,就能向量化近邻单元,加速算法。不同配置下的实验结果表明,改进后算法的单核加速比最高为1.75,多核加速比最高为149.16,且并行效率仍有29.13%。


关键词: OpenFOAM, 不可压缩流体模拟求解器, 性能优化, 单指令多数据(SIMD), 循环展开, 内联汇编, 加速比

Abstract: The incompressible fluid simulation solvers in the open-source fluid dynamics software OpenFOAM exhibit cross-platform applicability. However, their performance optimizations are predominantly tailored to supercomputing systems with existing architectures such as Intel, rendering their algorithmic optimizations unable to fully leverage the vectorized parallel advantages of the ARM architecture on the Tianhe supercomputing system. To address this, this paper focuses on incompressible fluid simulation solvers as the research subject and employs ARM vectorization techniques to optimize their symmetric Gauss-Seidel (SGS) method and diagonal incomplete Cholesky preconditioned conjugate gradient (DIC-PCG) method, thereby enhancing the solver’s operational efficiency. To achieve vectori- zation goals, this paper analyzes the relationships between neighboring grid cells during a single iteration of the two types of solving algorithms, revealing that the maximum number of neighboring cells is two and that there are no dependencies between them. Leveraging this prior knowledge, the original algorithm code is modified with minimal cost—specifically, by adding just four lines of if-else conditional statements—to vectorize the neighboring cells and accelerate the algorithms. Experimental results under various configurations demonstrate that the improved algorithm achieves a maximum single-core speedup of 1.75 and a maximum multi-core speedup of 149.16, with a parallel efficiency still reaching 29.13%.


Key words: OpenFOAM, incompressible fluid simulation solver, performance optimization, single instruction multiple data(SIMD), loop unrolling, inline assembly, speedup rate