A systolic array optimization strategy for switching matrix blocks in advance

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (01): 1-9.

• High Performance Computing • Previous Articles Next Articles

A systolic array optimization strategy for switching matrix blocks in advance

JU Xin，CAO Ya-song，WEN Mei，WANG Zhi，FENG Jing

(College of Computer Science and Technology,National University of Defense Technology，Changsha 410073,China)

Received:2022-10-13 Revised:2022-11-15 Accepted:2023-01-25 Online:2023-01-25 Published:2023-01-25

Abstract

Abstract: The demand for hardware computing power in AI applications increases year by year, driving the evolution of AI accelerators towards higher performance. Research shows that the main computing form of AI applications can be transformed into matrix multiplication, and systolic array has become one of the mainstream matrix multiplication acceleration technologies because of its unique advantages in matrix multiplication. However, there is a certain amount of pipeline filling and emptying overhead when the matrix is flowed into and out of the systolic array, especially for a floating-point systolic array that supports training, whose MAC latency is greater than 1. Untimely switching between matrix blocks will lead to a sharp drop in PE utilization. To solve these problems, theoretical analysis based on typical application scenarios is conducted, and an early switching strategy between matrix blocks is proposed, which can accurately calculate the optimal switching time between matrix blocks in various situations. The RTL design was implemented. The experimental results show that the hardware overhead of the optimized systolic array is slightly increased, but the performance can be improved in all scenarios.

Key words: systolic array, AI, GEMM, acceleration, processing element(PE) utilization

JU Xin, CAO Ya-song, WEN Mei, WANG Zhi, FENG Jing. A systolic array optimization strategy for switching matrix blocks in advance[J]. Computer Engineering & Science, 2023, 45(01): 1-9.

[1]	SHEN Jie, LONG Biao, HUANG Chun, TANG Tao, PENG Lin. Optimization of exponential and logarithm functions for vector units [J]. Computer Engineering & Science, 2025, 47(01): 18-26.
[2]	XU Chao, RUAN Rongyao, CHEN Yong, . A blockchain-based medical data auditing method [J]. Computer Engineering & Science, 2025, 47(01): 95-106.
[3]	CHEN Xinran, LIU Ning, YAN Zhongmin, LIU Lei, CUI Lizhen. An attention-guided dual-granularity cross-modal medical representation learning framework [J]. Computer Engineering & Science, 2025, 47(01): 150-159.
[4]	YANG Xu-dong, LI Qiu-yan, GAO Ling, LIU Xin, DENG Ya-ni. A distributed location anonymization method based on multi-blockchain collaboration [J]. Computer Engineering & Science, 2024, 46(12): 2171-2185.
[5]	ZENG Tao, WANG Jing-jing, ZHANG Han, LIU Yi-ding. A word-pair relationship modeling method for aspect-based sentiment information extraction in dialogue text [J]. Computer Engineering & Science, 2024, 46(12): 2239-2251.
[6]	HUANG Shan, WU Yu-fan, L He-xuan, DUAN Xiao-dong, . A heterogeneous differential synchronous parallel training algorithm [J]. Computer Engineering & Science, 2024, 46(11): 1949-1959.
[7]	YANG Song, WANG Xin-ru, LI Fan, ZHU Lie-huang, ZHAO Bo. A blockchain-based crowdsourcing incentive mechanism [J]. Computer Engineering & Science, 2024, 46(11): 1960-1970.
[8]	HU Shao-liu, CAI Yue-ping. Edge-disjoint path pair selection for the frame replication and elimination mechanism in time-sensitive networking [J]. Computer Engineering & Science, 2024, 46(11): 1979-1988.
[9]	. A RGB-D visual SLAM system based on lightweight object detection network [J]. Computer Engineering & Science, 2024, 46(11): 2017-2026.
[10]	LI Bao, ZHU Shu, WANG Xiao-chuan, REN Yi, TAN Yu-song. A time-aware dominant resource fair scheduling algorithm for edge function computing [J]. Computer Engineering & Science, 2024, 46(10): 1711-1719.
[11]	QIN Ying, YANG Ya-jing, MA Jun, WAN Jia-qi. Quantitative analysis of Linux kernel compatibility based on relationship diagram [J]. Computer Engineering & Science, 2024, 46(10): 1720-1734.
[12]	QIAO Zhen, YIN Chuan-zhong, QIU Xin. Path planning of long-range unmanned ship based on improved ant colony algorithm [J]. Computer Engineering & Science, 2024, 46(10): 1835-1842.
[13]	CHEN Qing-jiang, SHAO Fei, WANG Xuan-jun. Hybrid U-shaped network and Transformer for image deblurring [J]. Computer Engineering & Science, 2024, 46(10): 1843-1851.
[14]	SUN Jie, CHE Wen-gang, GAO Sheng-xiang. A low-rank cross-modal Transformer for multimodal sentiment analysis [J]. Computer Engineering & Science, 2024, 46(10): 1888-1900.
[15]	ZHAO Xin-bo, LU Zhong-hua. Research on key technologies of distributed training for Level2 market quotation factor mining [J]. Computer Engineering & Science, 2024, 46(09): 1554-1565.

A systolic array optimization strategy for switching matrix blocks in advance

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments