• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (8): 147-153.

• 论文 • 上一篇    下一篇

海洋环流数值模式POP的GPU并行优化

郭松,窦勇,雷元武   

  1. (国防科学技术大学高性能计算国家重点实验室,湖南 长沙 410073)
  • 收稿日期:2012-04-28 修回日期:2012-06-11 出版日期:2012-08-25 发布日期:2012-08-25
  • 基金资助:

    自然科学基金杰青基金资助项目(61125201);教育部博士点基金资助项目(60911062)

GPU Parallel Optimization of the Oceanic General Circulation Model POP

GUO Song,DOU Yong,LEI Yuanwu   

  1. (State Key Laboratory of High Performance Computing,
    National University of Defense Technology,Changsha 410073,China)
  • Received:2012-04-28 Revised:2012-06-11 Online:2012-08-25 Published:2012-08-25

摘要:

POP是一种全球海洋环流模式,广泛应用于海洋研究和气候预测。但是,随着模式分辨率的提高,POP对计算能力的需求呈几何级数增长,从而限制了POP模式的发展。本文在分析POP原理和特征的基础上,采用CUDA Fortran编程模型将POP模式移植到GPU平台上,并采用了网格块间并行和网格块内并行相结合的多层次并行实现全局存储器合并访问,减少局部存储器的使用,利用寄存器提高数据重用度和增大GPU端代码以减少CPU与GPU间的通信等优化策略。实验结果表明,与运行在Intel Xeon X5675 6核处理器上的串行程序和6进程并行程序相比,GPUPOP可以分别获得8.47倍和1.5倍的加速效果。

关键词: CUDA GPU, POP模式, GPUPOP, CUDA Fortran

Abstract:

POP is a global ocean circumfluence model,which is used widely to the ocean research and climate prediction.With the resolution increasing,the requirement of computing ability is geometrically increasing,which limits the development of the POP ocean model.Based on the analysis of the equations and numerical characteristics of the POP ocean model,this paper plants the POP ocean model to the GPU platform with the CUDA Fortran programming model,and adopts hybrid parallelism to implement coalesced access to the global memory, reduces the use of local memory,improves the data reusability with registers,and enlarges the code executing on the GPU to minimize the communication between CPU and GPU. Research and experiment show that the CUPOP running on one NVIDIA Tesla C2070 card can achieve up to 8.47 times and 1.5 times respectively,compared with the serial program and sixMPI processes program running on the Intel Xeon X5675 CPU.

Key words: CUDA GPU;parallel ocean program model;GPUPOP;CUDA Fortran