Orchestrating HPL between CPU and China accelerator

Computer Engineering & Science

Previous Articles Next Articles

Orchestrating HPL between CPU and China accelerator

GAN Xin-biao1,2，SUN Liao-yuan3,LIU Jie1，XIONG Cheng-wei1,HUANG Jia-kun1

（1.College of Computer,National University of Defense Technology,Changsha 410073;

2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093;

3.Institute of Quantum Information & State Key Laboratory of High Performance Computing,

National University of Defense Technology,Changsha 410073,China）

Received:2016-12-12 Revised:2017-02-15 Online:2018-01-25 Published:2018-01-25

Abstract

Abstract:

HPL is a Linpack benchmark package widely used in high performance computing test. Matrix is divided into sub-matrix and distributed into computing elements in traditional HPL algorithm. However, it is ineffective for China Accelerator because of a specified interface on matrix multiplication built in China Accelerator. Thus, dPEM (delicate Partition and Encapsulation on Matrix) is advised to expose a friendly testing configuration environment. Furthermore, we propose OA4MM (Orchestrating Algorithm for Matrix multiplication) based on heterogeneous system composed of CPU and China Accelerator. Experimental results validate dPEM and OA4MM on CPU + China Accelerator. OA4MM can promote productivity up to 10% in comparison to heterogeneous HPL.

Key words: HPL, China accelerator, delicate partition and encapsulation on matrix, orchestrating algorithm for matrix multiplication

GAN Xin-biao1,2，SUN Liao-yuan3,LIU Jie1，XIONG Cheng-wei1,HUANG Jia-kun1. Orchestrating HPL between CPU and China accelerator[J]. Computer Engineering & Science.

Orchestrating HPL between CPU and China accelerator

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 0

Recommended Articles 0

Metrics

Comments