• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Implementation and optimization of fast multipole
method on Sunway manycore processors
 

WANG Wu1,WANG Shuyang1,2,JIANG Jinrong1,MENG Hongsong3   

  1. (1.Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190;
    2.University of Chinese Academy of Sciences,Beijing 100049;
    3.National Supercomputing Center in Wuxi,Wuxi 214072,China)
     
  • Received:2018-10-25 Revised:2018-12-10 Online:2019-07-25 Published:2019-07-25

Abstract:

The fast multipole method (FMM) is a fast and efficient numerical algorithm for solving the Nbody problem and has various applications in cosmology and molecular dynamics. Sunway SW26010 is a heterogeneous manycore processor developed independently by China with 260 cores (4 core groups). We design and implement an FMM  on SW26010 manycore architecture. We also systematically optimize the performance  of kernel functions (especially for the most timeconsuming particle pair interaction), including asynchronous direct memory access (DMA), SIMD vectorization, loop unrolling and inline assembly tuning. Taking the particle pair interaction kernel as an example, the computational speed after optimization is about 400 times higher than the raw code running on the host core, and the floating-point performance on each core group is 250 GFLOPS, which is 32.5% of the theoretical peak performance.
 

Key words: fast multipole method (FMM), heterogeneous manycore processor, N-body problem, performance optimization