• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

面向ARMv8 64位多核处理器QTRSM的实现

杜琦,姜浩,李宽,彭林,杨灿群   

  1. (国防科学技术大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2016-09-05 修回日期:2016-11-07 出版日期:2017-03-25 发布日期:2017-03-25
  • 基金资助:

    国家863计划(2012AA01A301);国家自然科学基金(61402495,61303189,61602166,61170049,61402496)

QTRSM on ARMv8 64-bit multi-core processor

DU Qi,JIANG Hao,LI Kuan,PENG Lin,YANG Can-qun   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2016-09-05 Revised:2016-11-07 Online:2017-03-25 Published:2017-03-25

摘要:

在ARMv8 64位多核处理器上基于OpenBLAS实现了四精度三角矩阵求解(QTRSM)。基于两种数据格式分别实现了QTRSM,第一种实现利用GCC编译器对long double数据类型的支持来实现QTRSM,第二种实现采用double-double数据格式及其相应的四精度加减法、乘法和除法。以long double数据类型QTRSM为测试基准,就不同矩阵规模下测试结果精度和时间与double-double数据格式QTRSM进行比较。实验结果表明:两者得到近似相同精度的数值结果,但double-double数据格式QTRSM的性能是long double数据类型QTRSM的1.6倍。随着线程数的增加,两种QTRSM实现的加速比接近2.0,具有较好的可扩展性。
 

关键词: ARMv8 64位多核处理器, OpenBLAS, 四精度, double-double数据格式, QTRSM

Abstract:

We implement a quad-precision triangular matrix solution with multiple right-hand sides (QTRSM) based on OpenBLAS on the ARMv8 64-bit multi-core processor. We also propose two methods to implement QTRSM. One is based on GCC complier which accepts the long double data type as quad-precision floating-point numbers. The other uses the double-double data type and its corresponding quad-precision addition, subtraction, multiplication and division algorithms to implement QTRSM. We compare the two methods under different matrix sizes. Experimental results show that the two methods have the same accuracy. However, on average the method using double-double format runs 1.6 times faster than the one using long double format. As the number of threads increases, the speedup of the two QTRSM implementation methods are both close to 2.0, which has good scalability.

Key words: ARMv8 64-bit multi-core processor, OpenBLAS, quad-precision, double-double data type, QTRSM