NM-SpMM:A semi-structured sparse matrix multiplication algorithm for domestic heterogeneous vector processors

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (07): 1141-1150.

• High Performance Computing • Previous Articles Next Articles

NM-SpMM:A semi-structured sparse matrix multiplication algorithm for domestic heterogeneous vector processors

JIANG Jing-fei,HE Yuan-hong,XU Jin-wei,XU Shi-yao,QIAN Xi-fu

(National Key Laboratory of Parallel and Distributed Computing,College of Computer Science and Technology,
National University of Defense Technology,Changsha 410073,China)

Received:2023-11-07 Revised:2023-12-15 Accepted:2024-07-25 Online:2024-07-25 Published:2024-07-18

Abstract

Abstract: Deep neural networks have achieved excellent results in natural language processing, computer vision and other fields. Due to the growth of the scale of data processed by intelligent applications and the rapid development of large models, the inference performance of deep neural networks is increasingly demanding. N∶M semi-structured sparse scheme has become one of the hot technologies to balance the computing power demand and application effect. The domestic heterogeneous vector processor FT-M7032 provides more space for data parallelism and instruction parallelism development in intelligent model processing. In order to address the challenges of N∶M semi-structured sparse model computation with various sparse patterns, a flexible configurable sparse matrix multiplication algorithm NM-SpMM is proposed for FT-M7032. NM-SpMM designs an efficient compressed offset address sparse encoding format COA, which avoids the impact of semi-structured parameter configuration on sparse data access. Based on the COA, NM-SpMM performs fine-grained optimization of sparse matrix multiplication in different dimensions. The experimental results on FT-M7032 single core show that NM-SpMM can obtain 1.73~21.00 times speedup compared to dense matrix multiplication, and 0.04~1.04 times speedup compared to NVIDIA V100 GPU with CuSPARSE.

Key words: deep neural network, graphics processing unit, vector processor, sparse matrix multiplication, pipeline ,

JIANG Jing-fei, HE Yuan-hong, XU Jin-wei, XU Shi-yao, QIAN Xi-fu. NM-SpMM:A semi-structured sparse matrix multiplication algorithm for domestic heterogeneous vector processors[J]. Computer Engineering & Science, 2024, 46(07): 1141-1150.

[1]	LI Jin-xi, YIN Shou-yi, WEI Shao-jun, HU Yang. A codelet model based on MLIR [J]. Computer Engineering & Science, 2024, 46(07): 1151-1157.
[2]	DU Hao, MAO Run-zhang, DENG Yun-tong, HUANG Si-lu, XU Xiao-wen. MiniBranRAP:A minimizing branch parallel algorithm of the coarse matrix computation in AMG solver [J]. Computer Engineering & Science, 2024, 46(07): 1158-1166.
[3]	YU Ding-cui, LUO Long-fei, SONG Yun-peng, LI Wen-tong, SHI Liang. Exploration of memory page size for high-density flash memory [J]. Computer Engineering & Science, 2024, 46(07): 1167-1174.
[4]	SHI Yu, DONG Pan, ZHANG Li-jun. An irregular sparse matrix SpMV method [J]. Computer Engineering & Science, 2024, 46(07): 1175-1184.
[5]	WANG Jie, FU Dan-yang, . ROB compression method based on RISC-V superscalar processor [J]. Computer Engineering & Science, 2024, 46(07): 1185-1192.
[6]	HUA Yue-lin, ZHOU Xiao-lei, FAN Qiang, WANG Fang-xiao, YAN Hao, . Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph [J]. Computer Engineering & Science, 2024, 46(07): 1193-1201.
[7]	HU Xiao-yue, , WANG Qiang, Lv Fang-xu, XU Chao-long, ZHANG Jin. DSP design for 56 Gb/s high-speed SerDes receiver [J]. Computer Engineering & Science, 2024, 46(07): 1202-1209.
[8]	WEN Xin, ZENG Tao, LI Chun-bo, XU Zi-chen. A switch method of model inference serving oriented to serverless computing [J]. Computer Engineering & Science, 2024, 46(07): 1210-1217.
[9]	CHAI Xu-qing, QIAO Yi-hang, FAN Li-lin, . A method for constructing performance analysis model of high performance application based on random forest classifier [J]. Computer Engineering & Science, 2024, 46(07): 1218-1228.
[10]	DU Fang, JIAO Jian, JIAO Li-bo. An anti-forensic detection model based on causality calculation [J]. Computer Engineering & Science, 2024, 46(07): 1229-1236.
[11]	GUO Chang-hao, TANG Xiang-yun, WENG Yu. A data heterogeneity processing method based on asynchronous hierarchical federated learning [J]. Computer Engineering & Science, 2024, 46(07): 1237-1244.
[12]	WANG Kun, LI Shao-bo, HE Ling, ZHOU Peng. A network traffic prediction model based on improved northern goshawk optimization for stochastic configuration network [J]. Computer Engineering & Science, 2024, 46(07): 1245-1255.
[13]	ZHANG Yong-zhi, HE Ke-ren, GE Jue. Low-altitude remote sensing image object detection based on improved YOLOv7 network [J]. Computer Engineering & Science, 2024, 46(07): 1269-1277.
[14]	XIAO Zhen-jiu, LI Si-qi, QU Hai-cheng. Pedestrian detection based on multi-scale features and mutual supervision [J]. Computer Engineering & Science, 2024, 46(07): 1278-1285.
[15]	YANG Shi-qi, WU You-xi, GENG Meng, LI Yan. One-off three-way sequential patterns mining [J]. Computer Engineering & Science, 2024, 46(07): 1286-1295.

NM-SpMM:A semi-structured sparse matrix multiplication algorithm for domestic heterogeneous vector processors

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments