• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Orchestrating HPL between CPU and China accelerator

GAN Xin-biao1,2,SUN Liao-yuan3,LIU Jie1,XIONG Cheng-wei1,HUANG Jia-kun1   

  1. (1.College of Computer,National University of Defense Technology,Changsha 410073;
    2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093;
    3.Institute of Quantum Information & State Key Laboratory of High Performance Computing,
    National University of Defense Technology,Changsha 410073,China)
  • Received:2016-12-12 Revised:2017-02-15 Online:2018-01-25 Published:2018-01-25

Abstract:

HPL is a Linpack benchmark package widely used in high performance computing test. Matrix is divided into sub-matrix and distributed into computing elements in traditional HPL algorithm. However, it is ineffective for China Accelerator because of a specified interface on matrix multiplication built in China Accelerator. Thus, dPEM (delicate Partition and Encapsulation on Matrix) is advised to expose a friendly testing configuration environment. Furthermore, we propose OA4MM (Orchestrating Algorithm for Matrix multiplication) based on heterogeneous system composed of CPU and China Accelerator. Experimental results validate dPEM and OA4MM on CPU + China Accelerator. OA4MM can promote productivity up to 10% in comparison to heterogeneous HPL.

Key words: HPL, China accelerator, delicate partition and encapsulation on matrix, orchestrating algorithm for matrix multiplication