• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (04): 580-589.

• High Performance Computing • Previous Articles     Next Articles

Convolutional neural network inference and training vectorization method for multicore vector accelerators

CHEN Jie,LI Cheng,LIU Zhong   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2023-01-04 Revised:2023-05-08 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-17

Abstract: With the widespread application of deep learning, represented by convolutional neural networks (CNNs), the computational requirements of neural network models have increased rapidly, driving the development of deep learning accelerators. The research focus has shifted to how to accelerate and optimize the performance of neural network models based on the architectural characteristics of accelerators. For the VGG network model inference and training algorithms on the independently designed multi core vector accelerator FT-M7004, vectorized mapping methods for core operators such as convolution, pooling, and fully connected layers are proposed. Optimization strategies, including SIMD vectorization, DMA double-buffered transfer, and weight sharing, are employed to fully exploit the architectural advantages of the vector accelerator, achieving high computational efficiency. Experimental results indicate that on the FT-M7004 platform, the average computational efficiency for convolution layer inference and training is 86.62% and 69.63%, respectively; for fully connected layer inference and training, the average computational efficiency reaches 93.17% and 81.98%, respectively. The inference computational efficiency of the VGG network model on FT-M7004 exceeds that on the GPU platform by over 20%.

Key words: multicore vector accelerator, convolutional neural network, inference algorithm, training algorithm