• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (07): 1175-1184.

• High Performance Computing • Previous Articles     Next Articles

An irregular sparse matrix SpMV method

SHI Yu1,DONG Pan1,ZHANG Li-jun2   

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;
    2.24 Brigade,32228 Troop,Chinese Peoples Liberation Army,Fuzhou 350101,China)
  • Received:2023-11-07 Revised:2023-12-01 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-18

Abstract: Sparse matrix-vector multiplication (SpMV) is one of the key operators in the field of high performance computing and also has significant applications in emerging deep learning domains. Existing  research on SpMV often focuses on square sparse matrices, while there is still a lack of in-depth exploration for irregularly shaped sparse matrices (with unequal numbers of rows and columns). The characteristic of unequal numbers of rows and columns results in different storage features for these sparse matrices compared to square sparse matrices, leaving room for further optimization. Therefore, this paper establishes an SpMV performance model for irregularly shaped sparse matrices with unequal rows and columns, and analyzes that the performance bottleneck is caused by insufficient bandwidth for data exchange between cache and memory. At the same time, this paper carried out the following two optimization tasks: (1) Based on the commonly used CSR storage format for sparse matrices, a new RCSR storage format is proposed, which transforms and compresses a performance-limiting array in the CSR storage format, making SpMV more efficient; (2) An optimized SpMV algorithm based on the RCSR format is designed in conjunction with the SIMD instruction set extension of domestic processors. This paper tests regular and irregular sparse matrices on domestic Phytium processors. For regular sparse matrices, the comprehensive application of RCSR storage format, SIMD instructions, and OpenMP parallelization technology increases GFLOPS by 83.35% on average. For irregular sparse matrices, the performance improvement is related to the row-to-column ratio, and when the row-to- column ratio is not equal, the optimization effect is more obvious.

Key words: sparse matrix, irregular matrix, vector multiplication, multicore performance, performance optimization