A fused-layer attention model accelerator based on systolic array

Computer Engineering & Science ›› 2023, Vol. 45 ›› Issue (05): 802-809.

• High Performance Computing • Previous Articles Next Articles

A fused-layer attention model accelerator based on systolic array

LIU Xiao-hang1,JIANG Jing-fei2,XU Jin-wei2

（1.Graduate College,National University of Defense Technology,Changsha 410073;
2.Science and Technology on Parallel and Distributed Processing Laboratory,
National University of Defense Technology,Changsha 410073，China）

Received:2022-10-24 Revised:2022-12-15 Accepted:2023-05-25 Online:2023-05-25 Published:2023-05-16

Abstract

Abstract: Attention mechanism has recently shown superior performance in deep neural networks, its computation generates complex data flow and requires high computation and memory overheads. Therefore, customized accelerators are required to optimize the inference computing. This paper pro- poses an accelerator architecture for attention mechanism computation. A flexible partitioning method based on hardware control is used to divide the huge matrices in the attention model into hardware-friendly computing blocks, which realizes the systolic array in accelerator matched by the block computation match. A layer fusion computing structure based on two-step softmax function decomposition is proposed, which effectively reduces the memory access of attention mechanism computation. A fused-layer attention model accelerator based on fine-grained computational scheduling is designed and implemented by HDL. The performance was evaluated based on the XLINIX FPGA device and HLS tool. Compared with the CPU and GPU implementation under the same settings, the delay of accelerator was improved by 4.91 times, the efficiency of accelerator was improved by 1.24 times.

Key words: systolic array, attention mechanism, fused-layer, accelerator architecture, matrix block- ing, softmax ,

LIU Xiao-hang, JIANG Jing-fei, XU Jin-wei. A fused-layer attention model accelerator based on systolic array[J]. Computer Engineering & Science, 2023, 45(05): 802-809.

[1]	ZHANG Jianmin, XU Weikang, LIU Jinjin, LI Tiejun. Research advances in acceleration methods for particle transport non-deterministic simulation [J]. Computer Engineering & Science, 2025, 47(01): 1-9.
[2]	AN Xinchen. Structure optimization of second-level Cache in DSP processor [J]. Computer Engineering & Science, 2025, 47(01): 10-17.
[3]	YUAN Liangyong, QI Xingyun, L Fangxu, LUO Zhang, HUANG Heng, ZHANG Geng, WANG Wenchen, LI Meng, LAI Mingche. Research and design of clock recovery circuit for Duobinary signal [J]. Computer Engineering & Science, 2025, 47(01): 27-34.
[4]	WU Yuhong, WANG Jian. Fault diagnosis of analog circuits based on Patches-CNN [J]. Computer Engineering & Science, 2025, 47(01): 35-44.
[5]	WU Peicheng, ZHAO Xujun, JIN Lizhong. Anomaly detection of stream data based on grid density stacking [J]. Computer Engineering & Science, 2025, 47(01): 75-85.
[6]	LUO Yangxia, LI Hao, WU Chenming. Construction and research of malware knowledge graph [J]. Computer Engineering & Science, 2025, 47(01): 86-94.
[7]	XU Chao, RUAN Rongyao, CHEN Yong, . A blockchain-based medical data auditing method [J]. Computer Engineering & Science, 2025, 47(01): 95-106.
[8]	REN Ruilin, YANG Yan. An adaptive Gaussian function dehazing algorithm under channel difference prior [J]. Computer Engineering & Science, 2025, 47(01): 107-118.
[9]	QI Ranran, PALIDAN Tuerxun, TANG Bochuan, QIAN Yurong, . A road extraction method based on residual attention encoder-decoder network [J]. Computer Engineering & Science, 2025, 47(01): 119-129.
[10]	CHEN Zhaobo, ZHANG Lin, MA Xiaoxuan. Video anomaly detection with improved attention hybrid auto-encoder [J]. Computer Engineering & Science, 2025, 47(01): 130-139.
[11]	ZHANG Zheng, XIA Xiaoyun, CHEN Zefeng, XIANG Yi. A staged strategy incorporating reinforcement learning to solve the travelling thief problem [J]. Computer Engineering & Science, 2025, 47(01): 140-149.
[12]	CHEN Xinran, LIU Ning, YAN Zhongmin, LIU Lei, CUI Lizhen. An attention-guided dual-granularity cross-modal medical representation learning framework [J]. Computer Engineering & Science, 2025, 47(01): 150-159.
[13]	WANG Yang, XU Jiawei, WANG Ao, SONG Shijia, XIE Fan, ZHAO Chuanxin, JI Yimu. WiFi-based human activity recognition using cross-sequence prediction and consistency comparison [J]. Computer Engineering & Science, 2025, 47(01): 160-170.
[14]	GAO Jiyuan, LIU Jie, CHEN Changsheng, LI Wei, LIU Ying, YANG Jing, . A hybrid strategy improved dung beetle optimization algorithm [J]. Computer Engineering & Science, 2025, 47(01): 171-179.
[15]	JIANG Yunzhuo, GONG Zhengxian. Document-level neural machine translation based on rhetorical structure [J]. Computer Engineering & Science, 2025, 47(01): 180-190.

A fused-layer attention model accelerator based on systolic array

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments