• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (01): 1-9.

• High Performance Computing • Previous Articles     Next Articles

A systolic array optimization strategy for switching matrix blocks in advance

JU Xin,CAO Ya-song,WEN Mei,WANG Zhi,FENG Jing   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2022-10-13 Revised:2022-11-15 Accepted:2023-01-25 Online:2023-01-25 Published:2023-01-25

Abstract: The demand for hardware computing power in AI applications increases year by year, driving the evolution of AI accelerators towards higher performance. Research shows that the main computing form of AI applications can be transformed into matrix multiplication, and systolic array has become one of the mainstream matrix multiplication acceleration technologies because of its unique advantages in matrix multiplication. However, there is a certain amount of pipeline filling and emptying overhead when the matrix is flowed into and out of the systolic array, especially for a floating-point systolic array that supports training, whose MAC latency is greater than 1. Untimely switching between matrix blocks will lead to a sharp drop in PE utilization. To solve these problems, theoretical analysis based on typical application scenarios is conducted, and an early switching strategy between matrix blocks is proposed, which can accurately calculate the optimal switching time between matrix blocks in various situations. The RTL design was implemented. The experimental results show that the hardware overhead of the optimized systolic array is slightly increased, but the performance can be improved in all scenarios.

Key words: systolic array, AI, GEMM, acceleration, processing element(PE) utilization