• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (7): 1170-1180.

• High Performance Computing • Previous Articles     Next Articles

High performance Cholesky factorization on emerging GPU architectures using Tensor Cores

SHI Lu,ZOU Gaoyuan,WU Siqi,ZHANG Shaoshuai   

  1. (School of Computer Science and Engineering(School of Cyberspace Science and Technology),
    University of Electronic Science and Technology of China,Chengdu 611731,China)
  • Received:2024-11-04 Revised:2024-12-03 Online:2025-07-25 Published:2025-08-25

Abstract: The general matrixmatrix multiplications (GEMMs) can achieve highly optimized performance on Tensor Cores.However,due to its limited parallelism,the existing implementations of Cholesky factorization fail to reach most of the peak performance of Tensor Cores.This paper studies a recursive Cholesky factorization algorithm that recursively subdivides diagonal blocks,generating a large number of GEMMs operations between non-diagonal blocks.This algorithm enables the extraction of a higher proportion of the peak performance of Tensor Cores for internal symmetric Rank-K update (SYRK) and triangular solve matrix (TRSM) operations.Experimental results show that the recursive Cholesky decomposition algorithm proposed in this paper achieves speedups of 1.72× and 1.62× compared to the MAGMA/cuSOLVER algorithms on FP32 and FP16,respectively.


Key words: Cholesky factorization, high performance computing, numerical linear algebra, general-purpose computing on graphics processing units(GPGPU)