DRM: A GPU-parallel SpMV storage format based on iterative merge strategy

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (03): 381-394.

• High Performance Computing • Previous Articles Next Articles

DRM: A GPU-parallel SpMV storage format based on iterative merge strategy

WANG Yu-hua1,2,HE Jun-fei1,ZHANG Yu-qi1,XU Yue-zhu1,CUI Huan-yu1

(1.College of Computer Science and Technology,Harbin Engineering University,Harbin 150001；
2.Modeling and Emulation in E-Government National Engineering Laboratory,Harbin 150001,China)

Received:2023-07-14 Revised:2023-09-18 Accepted:2024-03-25 Online:2024-03-25 Published:2024-03-15

Abstract

Abstract: Sparse matrix vector multiplication (SpMV) is of great significance in the solution of linear systems, and is one of the core problems in scientific computing and engineering practice. Its performance highly depends on the non-zero distribution of sparse matrices. Sparse diagonal matrices are a special type of sparse matrices, whose non-zero elements are densely arranged in the form of diagonals. For sparse diagonal matrices, scholars have proposed various storage formats on the GPU platform, which have improved SpMV performance, but still suffer from zero padding and load imbalance issues. To address these issues, a DRM (Divide-Rearrange & Merge) storage format is proposed. This format uses matrix partitioning strategies based on fixed threshold values and matrix reconstruction strategies based on iterative merging to achieve sparse zero padding and load balancing between blocks. Experimental results show that on the NVIDIA Tesla V100 platform, compared to DIA, HDC, HDIA, and DIA-Adaptive formats, the time performance is accelerated by 20.76, 1.94, 1.13, and 2.26 times, respectively, and the floating point performance is improved by 1.54, 5.28, 1.13, and 1.94 times, respectively.

Key words: GPU, SpMV, sparse diagonal matrix, zero padding, load balancing

WANG Yu-hua, HE Jun-fei, ZHANG Yu-qi, XU Yue-zhu, CUI Huan-yu. DRM: A GPU-parallel SpMV storage format based on iterative merge strategy[J]. Computer Engineering & Science, 2024, 46(03): 381-394.

[1]	WANG Xin, PENG Jian. Implementation and optimization of HYB-based SpMV on the new-generation Sunway architecture [J]. Computer Engineering & Science, 2023, 45(10): 1754-1762.
[2]	GU Yue, ZHAO Yin-liang. Implementation and optimization of sparse matrix vector multiplication based on RISC-V vector instruction [J]. Computer Engineering & Science, 2022, 44(01): 1-8.
[3]	WANG Li, GUO Zhen-hua, CAO Fang, GAO Kai, ZHAO Ya-qian, ZHAO Kun. An automatic model splitting strategy generation method for model parallel training [J]. Computer Engineering & Science, 2020, 42(09): 1529-1537.
[4]	TAN Zhaonian，JI Weixing，AKREM Benatia，GAO Jianhua,LI Anmin,WANG Yizhuo. An SpMV partitioning and optimization algorithm on heterogeneous computing platforms [J]. Computer Engineering & Science, 2019, 41(04): 590-597.
[5]	SU Jin-zhu，WU Gu-ming,JIA Xun. Design and implementation of large sparse matrix vector multiplication on FPGA over GF(2) [J]. Computer Engineering & Science, 2016, 38(08): 1530-1535.
[6]	YANG Wangdong1,2,LI Kenli2. Implementation and optimization of HYB based SpMV on CPU+GPU heterogeneous computing systems [J]. J4, 2016, 38(02): 202-209.

DRM: A GPU-parallel SpMV storage format based on iterative merge strategy

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 6

Recommended Articles

Metrics

Comments