面向飞腾处理器的高精度求和与点乘算法实现和优化

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (01): 1-8.

面向飞腾处理器的高精度求和与点乘算法实现和优化

黄春1，姜浩1，谷同祥2，齐进2，刘文超1

（1.国防科技大学计算机学院，湖南长沙 410073；2.北京应用物理与计算数学研究所，北京 100088）

收稿日期:2020-05-30 修回日期:2020-06-30 接受日期:2021-01-25 出版日期:2021-01-25 发布日期:2021-01-22
基金资助:
国家重点研发计划（2017YFB0202003）；国家自然科学基金（61907034）；
科学挑战专题资助项目（TZ2016002）；湖南省自然科学基金（2018JJ3616）

Implementation and optimization of high-precision summation and dot product algorithms on Phytium processor

HUANG Chun1,JIANG Hao1,GU Tong-xiang2,QI Jin2,LIU Wen-chao1

(1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073;

2.Institute of Applied Physics and Computational Mathematics,Beijing 100088,China)

Received:2020-05-30 Revised:2020-06-30 Accepted:2021-01-25 Online:2021-01-25 Published:2021-01-22

摘要/Abstract

摘要： 在大规模和长时程数值计算中，浮点运算的舍入误差的累积效应可能导致数值结果不可信。求和与点乘是浮点数值计算中最为基础的运算，在大规模科学计算过程中被频繁调用，其数值结果精度至关重要。面向国产飞腾处理器，基于OpenBLAS，采用无误差变换技术设计了高效的汇编内核函数，实现并优化了高精度的求和与点乘算法。数值实验显示，该高精度算法的数值结果精度同原始算法在双倍工作精度下得到的数值结果精度相同，验证了本文算法的有效性；本文算法在单线程情况下运行时间分别是原始算法运行时间的1.57倍和1.76倍，在保证精度提升的同时效率没有明显的降低；在多线程情况下，同原始算法具有近乎相同的运行时间，体现了算法的高效性。理论误差分析进一步表明了本文算法的可靠性。

关键词: 无误差变换, 浮点数, 高精度, 求和, 点乘

Abstract: In large-scale and long-term numerical calculations, the cumulative effect of rounding errors in floating-point operations may lead to unreliable numerical results. Sum and dot multiplication are the most basic operations in floating-point numerical calculations. They are frequently called during large-scale scientific calculations, and the accuracy of their numerical results is very important. Oriented to the domestic Phytium processor, based on OpenBLAS, this paper uses error-free transformation technology to design efficient assembly kernel functions, and implements and optimizes the high-precision sum and dot product algorithms. Numerical experiments show that the accuracy of the numerical results of our high-precision algorithms is the same as that of the original algorithm under double working accuracy, which verifies the effectiveness of the algorithm. The running time of our algorithms is 1.57 and 1.76 times the running time of the original algorithms in the single-threaded case, and the efficiency is not significantly reduced while the accuracy is improved. In the case of multi-threading, it has almost the same running time as the original algorithms, which reflects the efficiency of our algorithms. Theoretical error analysis further ensures the reliability of our algorithms.

Key words: error-free transformation, float-point number, high-precision, summation, dot product , ,

黄春, 姜浩, 谷同祥, 齐进, 刘文超. 面向飞腾处理器的高精度求和与点乘算法实现和优化[J]. 计算机工程与科学, 2021, 43(01): 1-8.

HUANG Chun, JIANG Hao, GU Tong-xiang, QI Jin, LIU Wen-chao. Implementation and optimization of high-precision summation and dot product algorithms on Phytium processor[J]. Computer Engineering & Science, 2021, 43(01): 1-8.

[1]	袁珩洲, 桑浩, , 刘胜, , 陈小文, , 颜广达, 郭阳, . 基于PSS+PXF的ISF高精度振荡器噪声分析模型[J]. 计算机工程与科学, 2024, 46(06): 951-958.
[2]	袁珩洲, 桑浩, 颜广达, 冯军, 梁斌, 郭阳. 基于RC充电时间过零点不变性的高精度高稳定振荡器[J]. 计算机工程与科学, 2023, 45(01): 10-16.
[3]	郭思雨, 王磊. 基于编译时插桩的浮点异常检测方法[J]. 计算机工程与科学, 2022, 44(06): 979-985.
[4]	何康, 黄春, 姜浩, 谷同祥, 齐进, 刘杰, . 基于MPI的高精度归约函数设计与实现[J]. 计算机工程与科学, 2021, 43(04): 594-602.
[5]	兰静, 刘文超, 姜浩, 林文强. 基于SCILAB的多精度算法研究与实现[J]. 计算机工程与科学, 2020, 42(11): 1949-1955.
[6]	赵信，潘天锲,王飙. 一款高精度数控振荡器设计与实现[J]. 计算机工程与科学, 2018, 40(02): 218-223.
[7]	许盛伟1，陈诚1,2，王荣荣1,2. 针对椭圆曲线点乘算法的代数故障攻击[J]. 计算机工程与科学, 2017, 39(11): 2037-2042.
[8]	魏国珩1,2，汪亚2，张焕国1. 面向RFID应用的GF(2m)域上ECC点乘运算的轻量化改进研究[J]. 计算机工程与科学, 2017, 39(01): 81-85.
[9]	李洪珠，孙佳月. 一种基于EEMD的新型谐波误差分析策略[J]. J4, 2016, 38(04): 820-826.
[10]	刘高1，刘忆宁2，王东1. 一种可验证的多候选人电子投票方案[J]. J4, 2015, 37(09): 1667-1670.
[11]	吴焘1，李树国2，刘理天2. 一种新的余数系统下快速计算素域椭圆曲线点乘的方法[J]. J4, 2014, 36(10): 1839-1845.
[12]	侯昉,陆寄远,黄承惠. 多维浮点数据的曲线拟合压缩存储方法[J]. J4, 2014, 36(06): 1028-1033.
[13]	黄立波，王志英，沈立，马胜. 一种低成本128位高精度浮点SIMD乘加单元的设计与实现[J]. J4, 2012, 34(9): 71-76.
[14]	吴铁彬,刘衡竹,杨惠,张剑锋,侯申. 一种快速SIMD浮点乘加器的设计与实现[J]. J4, 2012, 34(1): 69-73.
[15]	何丰,胡俊,马浩,吴艳秋. 基于弧思想的改进的角点检测算法[J]. J4, 2011, 33(2): 108-111.