• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (01): 18-26.

• 高性能计算 • 上一篇    下一篇

面向向量部件的指数和对数函数优化方法

沈洁,龙标,黄春,唐滔,彭林   

  1. (国防科技大学计算机学院,湖南 长沙 410073)

  • 收稿日期:2023-07-13 修回日期:2013-12-20 接受日期:2025-01-25 出版日期:2025-01-25 发布日期:2025-01-18
  • 基金资助:
    国家重点研发计划(2020YFA0709803);国家自然科学基金(61902407)

Optimization of exponential and logarithm functions for vector units

SHEN Jie,LONG Biao,HUANG Chun,TANG Tao,PENG Lin   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-07-13 Revised:2013-12-20 Accepted:2025-01-25 Online:2025-01-25 Published:2025-01-18

摘要: 指数和对数函数是浮点计算中重要的超越函数,在不同应用领域使用广泛。现代处理器向量寄存器宽度呈现逐代增加的趋势,为了进一步提高上层应用对向量部件的利用率,研究向量指数和对数函数优化方法具有重要的科学价值和现实意义。针对现有向量函数实现的性能瓶颈,设计和实现了面向向量部件的指数和对数函数优化方法,包括基于硬件加速指令的向量查表优化、分支优化和精度性能取舍优化。模拟器上的实验表明,优化实现的向量指数和对数函数均达到业界高精度标准,函数性能优于当前最佳开源实现,加速比达1.44以上。真实应用测试进一步表明,应用程序在优化的向量函数支持下可以实现高效向量化,相比原始标量实现平均性能提升达2.53倍。

关键词: 指数函数, 对数函数, 向量化, 查表优化, 硬件加速指令

Abstract: Exponential and logarithmic functions are important transcendental functions in floating-point computation, widely used in various application fields. Modern processors exhibit a trend of increasing vector register width with each generation. To further enhance the utilization of vector units by upper-layer applications, researching optimization methods for vector exponential and logarithmic functions holds significant scientific value and practical importance. Addressing the performance bottlenecks of existing vector function implementations, this paper has  designed and implemented optimization methods for exponential and logarithmic functions tailored for vector units. These methods include vector lookup table optimization based on hardware acceleration instructions, branch optimization, and precision-performance trade-off optimization. Experiments on simulators demonstrate that the optimized vector exponential and logarithmic functions meet industry-standard high precision and outperform the current best open-source implementations, achieving a speedup ratio of over 1.44. Real-world application tests further show that applications can achieve efficient vectorization with the support of the optimized vector functions, resulting in an average performance improvement of 2.53 times compared to the original scalar implementations.

Key words: exponential functions, logarithm functions, vectorization, table-lookup optimization, hardware acceleration instructions