海洋模式FVCOM2.6并行计算性能TAU分析

J4 ›› 2011, Vol. 33 ›› Issue (12): 87-93.

海洋模式FVCOM2.6并行计算性能TAU分析

宋〓倩,胡〓松

(上海海洋大学海洋生态系统与环境实验室，上海 201306）

收稿日期:2011-04-14 修回日期:2011-09-18 出版日期:2011-12-24 发布日期:2011-12-25

Analysis of the Parallel Computing Performance of Ocean Model FVCOM2.6 Using TAU

SONG Qian,HU Song

（Marine Ecosystem and Environmental Laboratory,Shanghai Ocean University,Shanghai 201306,China）

Received:2011-04-14 Revised:2011-09-18 Online:2011-12-24 Published:2011-12-25

摘要/Abstract

摘要：

本文利用并行程序分析软件Tuning and Analysis Utilities(TAU)，对基于Message Passing Interface(MPI)的海洋环流模式FiniteVolume Coastal Ocean Model(FVCOM)2.6版本进行并行性能分析。在Linux集群（Intel Xeon CPU E5450，10G InfiniBand）上，使用不同进程数分别对低分辨率（网格节点数为2 108和10 378）、高分辨率（网格节点数为15 347和26 033）的深沪湾潮汐算例进行测试。结果表明，模式单进程运行时，平流项子程序所占运行时间比例较大；模式多进程运行时，通过比较不同算例的加速比，发现算例分辨率对模式的并行性能有较大影响。在本次测试硬件条件下，算例存在某一最佳进程数，低分辨率为32，高分辨率为64，最佳进程数随分辨率增高而增高。到达最佳进程数后，随着进程数增加，模式运行时间反而增加。TAU分析表明，主要是由于MPI_Waitany程序时间比例增加以致阻塞时间占模式运算总时间的比例增大，从而为FVCOM并行性能进一步改善提供参考。

关键词: FVCOM, TAU, 性能分析, 并行计算

Abstract:

This study applies Tuning and Analysis Utilities (TAU) to analyze the parallel performance of the unstructured grid FiniteVolume Coastal Ocean Model (FVCOM) version 2.6 based on Message Passing Interface (MPI). Examples of ShenHu Bay FVCOM tidal models, with low resolutions (2108 and 10378 nodes) and high resolutions (15347 and 26033 nodes), are tested using various processes on a linux cluster (Intel Xeon CPU E5450 and 10G InfiniBand). The results show that the advection subroutines occupied large proportion of running time as the models ran on a single process. The speed up of each test is examined; the grid number which affected the parallel performance as the models ran on multiple processes. Under the hardware condition of this study, each test had an optimal number of processes, which are 32 for low resolutions and 64 for high resolutions. The optimal number of processes is increased as the resolution increased. The total run time started increasing as the number of processes exceeded the optimal number. The TAU analysis shows that it is mainly due to the increasing times of calling MPI_Waitany subroutine so that the barrier time increased nearly proportionally to the total time, which provides information to improve the parallel performance for FVCOM in the future.

Key words: FVCOM;TAU;performance analysis;parallel computing

宋〓倩,胡〓松. 海洋模式FVCOM2.6并行计算性能TAU分析[J]. J4, 2011, 33(12): 87-93.

SONG Qian,HU Song. Analysis of the Parallel Computing Performance of Ocean Model FVCOM2.6 Using TAU[J]. J4, 2011, 33(12): 87-93.

编辑推荐

Metrics

阅读次数

全文

235

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	235

来源	本网站	其他网站

次数	202	33
比例	86%	14%

摘要

194

最新录用	在线预览	正式出版

0	0	194

	来源	本网站

	次数	194
	比例	100%

[1]	柴旭清, 乔一航, 范黎林, . 一种基于随机森林分类器构建高性能应用程序性能分析模型的方法[J]. 计算机工程与科学, 2024, 46(07): 1218-1228.
[2]	吴超, 卫谦, 周俊伟, 李会民, 孙广中. 基于异构计算平台的背景噪声预处理并行算法[J]. 计算机工程与科学, 2023, 45(10): 1711-1719.
[3]	王鑫, 彭健. 基于HYB格式SpMV在新一代申威架构上的实现与优化[J]. 计算机工程与科学, 2023, 45(10): 1754-1762.
[4]	刘屹成, 刘晓燕, 严馨. 并行平衡级联支持向量机[J]. 计算机工程与科学, 2023, 45(07): 1170-1177.
[5]	臧照虎, 李晨, 王耀华, 陈小文, 郭阳. 面向众核系统的层次化栅栏同步机制[J]. 计算机工程与科学, 2022, 44(11): 1901-1908.
[6]	张勇, 张曦, 万云博, 何先耀, 赵钟, 卢宇彤. 非结构有限体积CFD计算的网格重排序优化[J]. 计算机工程与科学, 2022, 44(10): 1721-1729.
[7]	范培勤, 过武宏, 韩梅, 唐帅, 张驰, . 水声环境特征参数并行预报方法研究[J]. 计算机工程与科学, 2021, 43(11): 1920-1925.
[8]	龚昊, 刘莹, 冯建周, 赵仁良, 冷佳旭, . 基于GPU加速的脉冲多普勒雷达信号处理[J]. 计算机工程与科学, 2021, 43(07): 1141-1149.
[9]	焦育威, 王鹏, 辛罡, . 基于采样尺度自适应的多尺度量子谐振子优化算法并行化[J]. 计算机工程与科学, 2021, 43(07): 1200-1209.
[10]	俞茂学, 贾东宁, 魏志强, 许佳立, 马广浩. 一种基于国产异构众核处理器的C++智能源码转换框架[J]. 计算机工程与科学, 2021, 43(06): 997-1005.
[11]	丁哲昭, 储根深, 胡长军, 李扬. 基于申威众核处理器的圣维南求解程序的并行与优化[J]. 计算机工程与科学, 2021, 43(05): 820-829.
[12]	雷国庆, 马驰远, 王永文, 郑重. 一种轻量级的处理器核性能分析框架[J]. 计算机工程与科学, 2021, 43(02): 199-204.
[13]	丁峻宏, 苗新强, 李根国. 面向异构超算的结构分析高效并行计算方法[J]. 计算机工程与科学, 2020, 42(12): 2133-2140.
[14]	蒋句平, 董德尊, 唐虹, 齐星云, 常俊胜, 庞征斌. 大规模高性能互连拓扑性能分析[J]. 计算机工程与科学, 2020, 42(10高性能专刊): 1730-1736.
[15]	徐传福, 车永刚, 李大力, 王勇献, 王正华. 天河超级计算机上超大规模高精度计算流体力学并行计算研究进展[J]. 计算机工程与科学, 2020, 42(10高性能专刊): 1815-1826.