• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (04): 662-669.

Previous Articles     Next Articles

High-performance implementation and optimization of Square Root function based on SIMD

ZHAO Yong-hao1,2,JIA Hai-peng2,ZHANG Yun-quan2,ZHANG Si-jia1#br# #br#   

  1. (1.College of Information Engineering,Dalian Ocean University,Dalian 116023;

    2.State Key Laboratory of Computer Architecture,

    Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

  • Received:2020-12-22 Revised:2021-01-26 Accepted:2021-04-25 Online:2021-04-25 Published:2021-04-21

Abstract: In computer graphics, integral calculation, neural network and other application scenarios, the high-performance implementation of Square Root function plays a very important role in the construction of the basic software ecology of processors. With the widespread use of ARM architecture processors, it becomes more critical to study the fast algorithm implementation of functions under ARM architecture. At present, SIMD architecture is adopted by a large number of processors. Therefore, it is of great significance and development prospect to study the high performance function calculation method based on SIMD. To this end, this paper implements and optimizes the Square Root function with high performance. By analyzing the storage format of IEEE 754 standard float point number in memory, an efficient algorithm of Square Root function is designed, and then the algorithm precision is further improved by combining Square Root inverse and Taylor formula algorithm. Finally, the algorithm performance is further improved by SIMD optimization. According to the experimental results, on the premise of satisfying the accuracy, the performance of the implemented Square Root function is more than 7 times higher than the libm algorithm library, and more than 3 times higher than the instruction of calculating Square Root provided by ARM V8. 

Key words: square root function, SIMD, high performance, numerical analysis, ARM V8 architecture