DRM:基于迭代归并策略的GPU并行SpMV存储格式

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (03): 381-394.

DRM:基于迭代归并策略的GPU并行SpMV存储格式

王宇华1,2,何俊飞1,张宇琪1，徐悦竹1，崔环宇1

(1.哈尔滨工程大学计算机科学与技术学院，黑龙江哈尔滨 150001；
2.电子政务建模仿真国家工程实验室，黑龙江哈尔滨 150001)

收稿日期:2023-07-14 修回日期:2023-09-18 接受日期:2024-03-25 出版日期:2024-03-25 发布日期:2024-03-15
基金资助:
国家自然科学基金(62072135)

DRM: A GPU-parallel SpMV storage format based on iterative merge strategy

WANG Yu-hua1,2,HE Jun-fei1,ZHANG Yu-qi1,XU Yue-zhu1,CUI Huan-yu1

(1.College of Computer Science and Technology,Harbin Engineering University,Harbin 150001；
2.Modeling and Emulation in E-Government National Engineering Laboratory,Harbin 150001,China)

Received:2023-07-14 Revised:2023-09-18 Accepted:2024-03-25 Online:2024-03-25 Published:2024-03-15

摘要/Abstract

摘要： 稀疏矩阵向量乘（SpMV）在线性系统的求解问题中具有重要意义，是科学计算和工程实践中的核心问题之一，其性能高度依赖于稀疏矩阵的非零分布。稀疏对角矩阵是一类特殊的稀疏矩阵，其非零元素按照对角线的形式密集排列。针对稀疏对角矩阵，在GPU平台上提出的多种存储格式虽然使SpMV性能有所提升，但仍存在零填充和负载不平衡的问题。针对上述问题，提出了一种DRM存储格式，利用基于固定阈值的矩阵划分策略和基于迭代归并的矩阵重构策略，实现了少量零填充和块间负载平衡。实验结果表明，在NVIDIA Tesla V100平台上，相比于DIA、HDC、HDIA和DIA-Adaptive格式，在时间性能方面，该存储格式分别取得了20.76,1.94,1.13和2.26倍加速;在浮点计算性能方面，分别提高了1.54,5.28,1.13和1.94倍。

关键词: GPU, SpMV, 稀疏对角矩阵, 零填充, 负载平衡

Abstract: Sparse matrix vector multiplication (SpMV) is of great significance in the solution of linear systems, and is one of the core problems in scientific computing and engineering practice. Its performance highly depends on the non-zero distribution of sparse matrices. Sparse diagonal matrices are a special type of sparse matrices, whose non-zero elements are densely arranged in the form of diagonals. For sparse diagonal matrices, scholars have proposed various storage formats on the GPU platform, which have improved SpMV performance, but still suffer from zero padding and load imbalance issues. To address these issues, a DRM (Divide-Rearrange & Merge) storage format is proposed. This format uses matrix partitioning strategies based on fixed threshold values and matrix reconstruction strategies based on iterative merging to achieve sparse zero padding and load balancing between blocks. Experimental results show that on the NVIDIA Tesla V100 platform, compared to DIA, HDC, HDIA, and DIA-Adaptive formats, the time performance is accelerated by 20.76, 1.94, 1.13, and 2.26 times, respectively, and the floating point performance is improved by 1.54, 5.28, 1.13, and 1.94 times, respectively.

Key words: GPU, SpMV, sparse diagonal matrix, zero padding, load balancing

王宇华, 何俊飞, 张宇琪, 徐悦竹, 崔环宇. DRM:基于迭代归并策略的GPU并行SpMV存储格式[J]. 计算机工程与科学, 2024, 46(03): 381-394.

WANG Yu-hua, HE Jun-fei, ZHANG Yu-qi, XU Yue-zhu, CUI Huan-yu. DRM: A GPU-parallel SpMV storage format based on iterative merge strategy[J]. Computer Engineering & Science, 2024, 46(03): 381-394.

[1]	顾越, 赵银亮. 基于RISC-V向量指令的稀疏矩阵向量乘法实现与优化[J]. 计算机工程与科学, 2022, 44(01): 1-8.
[2]	谈兆年，计卫星，Akrem Benatia，高建花，李安民，王一拙. 面向异构计算平台的SpMV划分优化算法研究[J]. 计算机工程与科学, 2019, 41(04): 590-597.
[3]	阳王东1,2 ,李肯立2. 基于HYB格式稀疏矩阵与向量乘在CPU+GPU异构系统中的实现与优化[J]. J4, 2016, 38(02): 202-209.