Exploring fine grained task parallel GEMM on 
single-and multi-GPU systems

J4 ›› 2015, Vol. 37 ›› Issue (05): 847-856.

• 论文 • Next Articles

Exploring fine grained task parallel GEMM on
single-and multi-GPU systems

ZHANG Shuai，LI Tao,WANG Yifeng,JIAO Xiaofan,YANG Yulu

（College of Computer and Control Engineering,Nankai University,Tianjin 300071,China）

Received:2014-10-15 Revised:2014-12-20 Online:2015-05-25 Published:2015-05-25

Abstract

Abstract:

The Dense Linear Algebra (DLA), which is very important to many applications such as pattern recognition and bioinformatics,depends critically on the general matrixmatrix multiplication (GEMM) routine.In current cuBLAS and MAGMA libraries,GEMM is implemented with kernel functions to achieve high performance for large GEMM.However,they are not efficient for multiple independent small matrices,even though the interfaces for batched small GEMMs are provided in cuBLAS.Moreover,they cannot automatically scale across multiple different GPUs with good load balancing.In this paper,we propose a task parallel GEMM (TPGEMM) that explores fine grained task parallelism for batched and multi-GPU GEMMs.The workloads of one or more GEMMs are decomposed into tasks which are scheduled to persistent GPU kernels at runtime.The TPGEMM avoids the overhead for launching multiple kernels and achieves better performance for batched small GEMMs compared with the cuBLAS and MAGMA libraries.Based on the fine grained task scheduling with low overhead, TPGEMM supports auto-parallelization across multiple GPUs and achieves an efficiency close to 100% on a workstation with 4 different GPUs.

Key words: GEMM;persistent kernel;task parallelism;load balancing

ZHANG Shuai，LI Tao,WANG Yifeng,JIAO Xiaofan,YANG Yulu. Exploring fine grained task parallel GEMM on
single-and multi-GPU systems [J]. J4, 2015, 37(05): 847-856.

Exploring fine grained task parallel GEMM on
single-and multi-GPU systems

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles 0

Metrics

Comments

Exploring fine grained task parallel GEMM on single-and multi-GPU systems

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles 0

Metrics

Comments

Exploring fine grained task parallel GEMM on
single-and multi-GPU systems