• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (09): 1529-1538.

• High Performance Computing • Previous Articles     Next Articles

A scalable parallel structured matrix multiplication algorithm framework

LI Sheng-guo1,LIAO Xia2,YU Heng-biao1,HUANG Chun1,JIANG Hao1,LU Xi-yan1,WANG Hua-lin3,CHENG Li-zhi3   

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    2.Department of Computer Science and Technology,Tsinghua University,Beijing 100084;
    3.College of Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-01-04 Revised:2023-07-04 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

Abstract: Structured matrices play an important role in scientific computing and engineering applications, such as Cauchy, Toeplitz, Vandermonde, and Hankel matrices. Although these matrices are dense, they can be expressed with only O(n)  parameters (generators), where n is the dimension of the matrix. The core idea of the algorithm in this paper is to use matrix generators to explicitly construct local matrix blocks of each process, thereby reducing communication overhead. Additionally, by leveraging the numerical low-rank property of these matrix blocks. This paper further minimize computational overhead. Consequently, the proposed parallel structured matrix multiplication algorithm framework can simultaneously reduce both computational and communication costs, making it suitable for matrix multiplication algorithms like Cannon, Fox, and PUMMA. Extensive numerical tests were conducted on the Tianhe-2 supercomputer, and the results demonstrate that the proposed algorithm achieves an 8.96× speedup compared to the PDGEMM function in ScaLAPACK. 

Key words: structured matrix, matrix multiplication, FFT, Cauchy matrix, Toeplitz matrix, distributed parallel