A hybrid matrix-vector processor with dynamically reconfigurable dataflow

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (11): 1912-1921.

• High Performance Computing • Previous Articles Next Articles

A hybrid matrix-vector processor with dynamically reconfigurable dataflow

AI Chenyang1,ZHAO Lechuan,HUA Tao,WANG Xin’an,WANG Ying

(1.School of Electronic and Computer Enginnering,Peking University,Shenzhen 518000；
2.Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China)

Received:2024-11-14 Revised:2025-01-04 Online:2025-11-25 Published:2025-12-04

Abstract

Abstract: Systolic arrays, as energy-efficient accelerators for general matrix multiplication (GEMM) operators, have garnered widespread attention from both academia and industry. However, they often occupy a substantial amount of area and typically require collaboration with VPU (vector processing unit) components, a combination frequently seen in neural network accelerators. Additionally, they suffer from issues such as low temporal and spatial utilization rates and limited performance in end-to-end scenarios. To address these challenges, a hybrid vector systolic array (HVSA) is proposed by integrating systolic arrays with vector processors. By reusing the storage, broadcasting, and inter-channel communication units within the VPU, this architecture enables reconfigurable capabilities in terms of array shape and data flow, allowing for more efficient support of GEMM and vector operations within an acceptable hardware area overhead. Furthermore, an end-to-end compilation framework tailored for HVSA is introduced, encompassing an MLIR-based compilation frontend, data flow scheduling, and a programming model compatible with the RISC-V vector extension. Experimental data demonstrates that HVSA achieves a 30.30-fold increase in computational speed compared to a systolic array of equivalent area. In end-to-end applications, the average operating time of HVSA is reduced to around 4.7% of the original compared to the "VPU+SA" of the same area, and energy consumption is reduced by approximately 58.7%.

Key words: general matrix multiplication(GEMM), vector operation, systolic array, vector proces- sing unit(VPU), dataflow scheduling, compiler

AI Chenyang1, ZHAO Lechuan, HUA Tao, WANG Xin’an, WANG Ying. A hybrid matrix-vector processor with dynamically reconfigurable dataflow[J]. Computer Engineering & Science, 2025, 47(11): 1912-1921.

[1]	LIU Xiao-hang, JIANG Jing-fei, XU Jin-wei. A fused-layer attention model accelerator based on systolic array [J]. Computer Engineering & Science, 2023, 45(5): 802-809.
[2]	JU Xin, CAO Ya-song, WEN Mei, WANG Zhi, FENG Jing. A systolic array optimization strategy for switching matrix blocks in advance [J]. Computer Engineering & Science, 2023, 45(1): 1-9.
[3]	. [J]. J4, 2006, 28(10): 20-22.

A hybrid matrix-vector processor with dynamically reconfigurable dataflow

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 3

Recommended Articles

Metrics

Comments