• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2020, Vol. 42 ›› Issue (11): 1913-1921.

Previous Articles     Next Articles

Design of BLAS level3 computation on a matrix multiplication coprocessor

JIA Xun,QIAN Lei,YUAN Hao,ZHANG Kun,WU Dong   

  1. (State Key Laboratory of Mathematical Engineering and Advanced Computing,Wuxi 214125,China)
  • Received:2020-06-08 Revised:2020-08-11 Accepted:2020-11-25 Online:2020-11-25 Published:2020-11-26

Abstract: BLAS level3 subprograms have high computation complexity, which usually become applications' performance bottleneck. By organizing largescale floatingpoint units into a linear array architecture, the matrix multiplication coprocessor can perform highperformance and efficient matrix multiplication. Achieving efficient BLAS level3 computation on the matrix multiplication coprocessor is essential for the acceleration of largescale science and engineering applications. 
By taking matrix multiplication as the kernel and combining the characteristics of the underlying linear array architecture, this paper proposes the design of BLAS level3 computation on a matrix multiplication coprocessor, and construct a corresponding performance model. Experimental results show that SYMM, SYRK and TRMM subprograms on the matrix multiplication coprocessor achieves the computation efficiency of 99%, 98% and 80% respectively, at most 31% higher than those on the SW26010 and NVIDIA V100 GPU.



Key words: linear array, matrix multiplication, coprocessor, BLAS level-3

CLC Number: