• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (12): 87-93.

• 论文 • 上一篇    下一篇

海洋模式FVCOM2.6并行计算性能TAU分析

宋〓倩,胡〓松   

  1. (上海海洋大学海洋生态系统与环境实验室,上海 201306)
  • 收稿日期:2011-04-14 修回日期:2011-09-18 出版日期:2011-12-24 发布日期:2011-12-25

Analysis of the Parallel Computing Performance of Ocean Model FVCOM2.6 Using TAU

SONG Qian,HU Song   

  1. (Marine Ecosystem and Environmental Laboratory,Shanghai Ocean University,Shanghai 201306,China)
  • Received:2011-04-14 Revised:2011-09-18 Online:2011-12-24 Published:2011-12-25

摘要:

本文利用并行程序分析软件Tuning and Analysis Utilities(TAU),对基于Message Passing Interface(MPI)的海洋环流模式FiniteVolume Coastal Ocean Model(FVCOM)2.6版本进行并行性能分析。在Linux集群(Intel Xeon CPU E5450,10G InfiniBand)上,使用不同进程数分别对低分辨率(网格节点数为2 108和10 378)、高分辨率(网格节点数为15 347和26 033)的深沪湾潮汐算例进行测试。结果表明,模式单进程运行时,平流项子程序所占运行时间比例较大;模式多进程运行时,通过比较不同算例的加速比,发现算例分辨率对模式的并行性能有较大影响。在本次测试硬件条件下,算例存在某一最佳进程数,低分辨率为32,高分辨率为64,最佳进程数随分辨率增高而增高。到达最佳进程数后,随着进程数增加,模式运行时间反而增加。TAU分析表明,主要是由于MPI_Waitany程序时间比例增加以致阻塞时间占模式运算总时间的比例增大,从而为FVCOM并行性能进一步改善提供参考。

关键词: FVCOM, TAU, 性能分析, 并行计算

Abstract:

This study applies Tuning and Analysis Utilities (TAU) to analyze the parallel performance of the unstructured grid FiniteVolume Coastal Ocean Model (FVCOM) version 2.6 based on Message Passing Interface (MPI). Examples of ShenHu Bay FVCOM tidal models, with low resolutions (2108 and 10378 nodes) and high resolutions (15347 and 26033 nodes), are  tested using various processes on a linux cluster (Intel Xeon CPU E5450 and 10G InfiniBand). The results show that the advection subroutines occupied large proportion of running time as the models ran on a single process. The speed up of each test is examined; the grid number which affected the parallel performance as the models ran on multiple processes. Under the hardware condition of this study, each test had an optimal number of processes, which are  32 for low resolutions and 64 for high resolutions. The optimal number of processes is increased as the resolution increased. The total run time started increasing as the number of processes exceeded the optimal number. The TAU analysis shows that it is mainly due to the increasing times of calling MPI_Waitany subroutine so that the barrier time increased nearly proportionally to the total time, which provides information to improve the parallel performance for FVCOM in the future.

Key words: FVCOM;TAU;performance analysis;parallel computing