• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (3): 381-391.

• 高性能计算 • 上一篇    下一篇

基于监督学习的稀疏矩阵乘算法优选

彭林,张鹏,陈俊峰,唐滔,黄春   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2023-11-14 修回日期:2024-02-22 出版日期:2025-03-25 发布日期:2025-04-01
  • 基金资助:
    国家重点研发计划(2020YFA0709803)

Selection of sparse matrix multiplication algorithms based on supervised learning

PENG Lin,ZHANG Peng,CHEN Junfeng,TANG Tao,HUANG Chun   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-11-14 Revised:2024-02-22 Online:2025-03-25 Published:2025-04-01

摘要: 稀疏矩阵乘算法中主流的row-by-row计算公式上的SPA、HASH、ESC 3种稀疏矩阵乘实现算法,在对不同的稀疏矩阵进行计算时性能差异显著,在不同非零元规模上单一算法不总是能取得最佳性能,而且单一算法与最优选择存在明显差距。为此,提出了一种基于机器学习的最优稀疏矩阵乘算法选择模型,以给定矩阵集作为数据源,抽取稀疏矩阵的特征,并使用SPA、HASH、ESC计算获得的性能数据进行训练和验证,获得的模型能够仅使用稀疏矩阵的特征即可完成对新数据集的算法优选。实验结果表明,该模型可以获得91%以上的预测准确率,平均性能达到最优选择的98%,是单一算法性能的1.55倍以上,并且可在实际库函数中使用,具有良好的泛化能力和实用价值。

关键词: 稀疏矩阵乘, SpGEMM, SPA算法, HASH算法, ESC算法, 机器学习

Abstract: The sparse matrix multiplication algorithms with mainstream row-by-row calculation formulas, including SPA, HASH, and ESC, have significant performance disparities in different sparse matrices. There is no single optimal algorithm for all sparse matrices. A single algorithm cannot always achieve optimal performance on different non-zero element scales, and there is a significant gap between a single algorithm and the optimal selection. To this end, a selection model of sparse matrix multiplication algorithm based on supervised learning is proposed. A given set of matrices is used as the data source to extract the features of the sparse matrices, and performance data is obtained using SPA, HASH, and ESC calculations for training and validation. The resulting model can select the optimal algorithm for a new dataset solely based on the features of the sparse matrix. The experimental results show that this model can achieve a prediction accuracy of over 91%, with an average performance of 98% of the optimal selection, which is more than 1.55 times the performance of a single algorithm. It can also be used in practical library functions and has good generalization ability and practical value. 

Key words: sparse matrix multiplication, SpGEMM, SPA algorithm, HASH algorithm, ESC algorithm, machine learning