Computer Engineering & Science
Previous Articles Next Articles
GAN Xin-biao1,2,SUN Liao-yuan3,LIU Jie1,XIONG Cheng-wei1,HUANG Jia-kun1
Received:
Revised:
Online:
Published:
Abstract:
HPL is a Linpack benchmark package widely used in high performance computing test. Matrix is divided into sub-matrix and distributed into computing elements in traditional HPL algorithm. However, it is ineffective for China Accelerator because of a specified interface on matrix multiplication built in China Accelerator. Thus, dPEM (delicate Partition and Encapsulation on Matrix) is advised to expose a friendly testing configuration environment. Furthermore, we propose OA4MM (Orchestrating Algorithm for Matrix multiplication) based on heterogeneous system composed of CPU and China Accelerator. Experimental results validate dPEM and OA4MM on CPU + China Accelerator. OA4MM can promote productivity up to 10% in comparison to heterogeneous HPL.
Key words: HPL, China accelerator, delicate partition and encapsulation on matrix, orchestrating algorithm for matrix multiplication
GAN Xin-biao1,2,SUN Liao-yuan3,LIU Jie1,XIONG Cheng-wei1,HUANG Jia-kun1. Orchestrating HPL between CPU and China accelerator[J]. Computer Engineering & Science.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2018/V40/I01/10