• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

面向大规模海洋数据同化算法的并行实现及优化

万威强,肖俊敏,洪学海,谭光明   

  1. (中国科学院计算技术研究所,北京 100190)
  • 收稿日期:2018-10-08 修回日期:2018-12-18 出版日期:2019-05-25 发布日期:2019-05-25
  • 基金资助:

    国家重点研发计划重点专项(2016YFC1401706);国家自然科学基金(61802369)

Parallel implementation and optimization of a
large scale ocean data assimilation algorithm

WAN Weiqiang,XIAO Junmin,HONG Xuehai,TAN Guangming   

  1. (Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)
  • Received:2018-10-08 Revised:2018-12-18 Online:2019-05-25 Published:2019-05-25

摘要:

海洋数据同化是一种将海洋观测资料融合到海洋数值模式中的有效手段,经过同化的海洋数据更加接近海洋的真实情况,对人类理解和认识海洋具有重要意义。围绕海洋数据同化设计了一种基于区域分解的一般性并行实现方法。在此基础上,提出了一种基于IO代理的新并行算法。首先,IO代理进程负责数据的并行读取;接下来,IO代理进程对数据进行切块,然后将块数据发送给相应的计算进程;当计算进程完成局部数据同化后,IO代理进程负责收集计算进程的同化结果,并将其写入磁盘。该方法的主要优势在于:利用IO代理进程来负责IO,而不是像传统方法那样让所有进程都来参与IO(直接并行IO),这样可以防止大量进程对磁盘的同时访问,有效避免进程排队所导致的等待。在天河二号集群上的测试结果表明,对于1度分辨率的数据同化,在核心数为425时,该并行实现的总运行时间为9.1 s,相对于传统串行程序的加速比接近38倍。此外,对于0.1度分辨率的数据同化,基于IO代理的并行同化算法在使用10 000核时依然具有较好的可扩展性,并且可将其IO时间最大限制在直接并行IO时间的1/9。

关键词: 海洋数据同化, 集合最优插值, 区域分解, IO代理结点

Abstract:

Ocean data assimilation is an effective method to integrate ocean observation data into the ocean numerical model. Assimilated ocean data is closer to the real situation of the ocean, so it is of great significance for human to understand and study   the ocean. We design a general parallel implementation method for ocean data assimilation based on the domain decomposition strategy. We further propose a new parallel algorithm based on IO proxy. Firstly, IO proxy processes are in charge of parallel reading of data. Then, they split data into many blocks, and send different blocks to corresponding computation processes. After completion of local data assimilation, IO proxy processes collect local assimilation results from computation processes, and write them into the disk. The main advantage of this parallel method is that IO proxy processes takes charge of IO, rather than allowing all processes to participate in IO (direct parallel IO). This can prevent a large number of processes from accessing the disk simultaneously, thus effectively avoiding the waiting caused by processes queuing. Test results based on Tianhe2 clusters show that, for the assimilation of data with 1degree resolution, when there are 425 cores, the total running time of the proposed parallel implementation is 9.1s, which is nearly 38 times faster than that of traditional serial programs. In addition, for the assimilation of data with 0.1 degree resolution, the parallel assimilation algorithm using IO proxy still has a good scalability on 10,000 cores, and its IO time can be limited to at most 1/9 of the direct parallel IO time.

 

Key words: ocean data assimilation, ensemble optimal interpolation (EnOI), domain decomposition, IO proxy node