• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2020, Vol. 42 ›› Issue (10高性能专刊): 1842-1851.

Previous Articles     Next Articles

Parallel optimization of Tend_lin application on the Sunway TaihuLight supercomputer

姜尚志1,唐生林2,高希然2,花嵘1,陈莉2,刘颖2#br#

#br#
  

  1. (1.College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590;

    2.State Key Laboratory of Computer Architecture,Institute of Computing Technology,
    Chinese Academy of Sciences,Beijing 100190,China)

  • Received:2020-06-11 Revised:2020-07-29 Accepted:2020-10-25 Online:2020-10-25 Published:2020-10-23

Abstract:

Numerical simulation of the global atmospheric circulation is one of the main tools to understand the formation and dynamic behaviors of global climate, and it is also a great challenge to port and optimize such a complex application onto large scale heterogeneous platforms. Tend_lin is the hot spot of the dynamic core of IAP AGCM-4 (the 4th generation of IAP atmospheric general circulation model), and it has a low compute-to-communication ratio. The paper ports Tend_lin to SunWay Taihulight (a large scale heterogeneous computing platform) using two different parallel application programming interfaces. The paper introduces how to parallelize the program using a data-driven parallel application programming interface AceMesh, the task parallelization method of computation loops and MPI communication, how to relax the sharing of the communication resources, and the task mapping diffe- rences between a single-level task graph and a nested task graph. The experimental results show that AceMesh can attain more than 2 times speedups compared with the OpenACC version when using 16 to 1 024 processes. The paper analyzes and explains the reasons of the performance improvement.





Key words: global atmospheric general circulation model, high resolution, data driven task parallel language, OpenACC, MPI