• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (09): 1521-1528.

• High Performance Computing • Previous Articles     Next Articles

Performance evaluation and analysis of vectorized SpMV algorithm based on scratchpad memory

ZHANG Zong-mao,DONG De-zun,WANG Zi-cong,CHANG Jun-sheng,ZHANG Xiao-yun,WANG Shao-cong   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-10-23 Revised:2023-11-22 Accepted:2024-09-25 Online:2024-09-25 Published:2024-09-19

Abstract: Scratchpad memory (SPM), as an on-chip high-speed memory with a simple structure, fixed access latency, and direct software control, has been widely used in modern processor design. Sparse matrix vector multiplication (SpMV) is one of the critical kernel computation functions in high performance computing, artificial intelligence, and other application domains. In traditional multi-level cache processors, the irregular access operations of dense input vectors during the computation of the SpMV algorithm often lead to a significant number of cache misses, thereby affecting the execution efficiency of the SpMV algorithm. To evaluate the performance impact of scratchpad memory on the SpMV vector algorithm, this paper utilizes ARMs scalable vector extension (SVE) instructions to vectorize the SpMV algorithm based on the compressed sparse row (CSR) format. It stores the hot data, namely the dense input vectors, in the scratchpad memory and conducts a performance analysis of the SpMV vector algorithm on ARM-based processors integrated with scratchpad memory. This paper conducts experiments on 2 562 sparse matrices from real-world applications using the gem5 simulator. The experimental results show that, compared to traditional processor architectures, running the SpMV vector algorithm on the processor architecture integrated with scratchpad memory can achieve a maximum speedup of 7.45 times and an average speedup of 1.11 times.


Key words: sparse matrix vector multiplication, scratchpad memory, compressed sparse row(CSR), ARM scalable vector extension(SVE)