• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (01): 1-8.

• 高性能计算 • 上一篇    下一篇

面向飞腾处理器的高精度求和与点乘算法实现和优化

黄春1,姜浩1,谷同祥2,齐进2,刘文超1   

  1. (1.国防科技大学计算机学院,湖南 长沙 410073;2.北京应用物理与计算数学研究所,北京 100088)
  • 收稿日期:2020-05-30 修回日期:2020-06-30 接受日期:2021-01-25 出版日期:2021-01-25 发布日期:2021-01-22
  • 基金资助:
    国家重点研发计划(2017YFB0202003);国家自然科学基金(61907034);
    科学挑战专题资助项目(TZ2016002);湖南省自然科学基金(2018JJ3616)

Implementation and optimization of high-precision summation and dot product algorithms on Phytium processor

HUANG Chun1,JIANG Hao1,GU Tong-xiang2,QI Jin2,LIU Wen-chao1   

  1. (1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;

    2.Institute of Applied Physics and Computational Mathematics,Beijing 100088,China)

  • Received:2020-05-30 Revised:2020-06-30 Accepted:2021-01-25 Online:2021-01-25 Published:2021-01-22

摘要: 在大规模和长时程数值计算中,浮点运算的舍入误差的累积效应可能导致数值结果不可信。求和与点乘是浮点数值计算中最为基础的运算,在大规模科学计算过程中被频繁调用,其数值结果精度至关重要。面向国产飞腾处理器,基于OpenBLAS,采用无误差变换技术设计了高效的汇编内核函数,实现并优化了高精度的求和与点乘算法。数值实验显示,该高精度算法的数值结果精度同原始算法在双倍工作精度下得到的数值结果精度相同,验证了本文算法的有效性;本文算法在单线程情况下运行时间分别是原始算法运行时间的1.57倍和1.76倍,在保证精度提升的同时效率没有明显的降低;在多线程情况下,同原始算法具有近乎相同的运行时间,体现了算法的高效性。理论误差分析进一步表明了本文算法的可靠性。

关键词: 无误差变换, 浮点数, 高精度, 求和, 点乘

Abstract: In large-scale and long-term numerical calculations, the cumulative effect of rounding errors in floating-point operations may lead to unreliable numerical results. Sum and dot multiplication are the most basic operations in floating-point numerical calculations. They are frequently called during large-scale scientific calculations, and the accuracy of their numerical results is very important. Oriented to the domestic Phytium processor, based on OpenBLAS, this paper uses error-free transformation technology to design efficient assembly kernel functions, and implements and optimizes the high-precision sum and dot product algorithms. Numerical experiments show that the accuracy of the numerical results of our high-precision algorithms is the same as that of the original algorithm under double working accuracy, which verifies the effectiveness of the algorithm. The running time of our algorithms is 1.57 and 1.76 times the running time of the original algorithms in the single-threaded case, and the efficiency is not significantly reduced while the accuracy is improved. In the case of multi-threading, it has almost the same running time as the original algorithms, which reflects the efficiency of our algorithms. Theoretical error analysis further ensures the reliability of our algorithms.



Key words: error-free transformation, float-point number, high-precision, summation, dot product ,  ,