• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

A GPU-based high-performance optimization method
of sparse convolutional neural networks

FANG Cheng,XING Zuocheng,CHEN Xuhao,ZHANG Yang   

  1. (College of Computer,National University of Defense Technology,Changsha 410073,China)
  • Received:2018-06-21 Revised:2018-08-15 Online:2018-12-25 Published:2018-12-25

Abstract:

As an important branch of neural networks, the convolutional neural network (CNN) is currently more suitable for learning and expressing image features than other neural network methods. With the continuous development of the CNN, there are more challenges. The parameters scale of the CNN is growing larger, which makes the demand for computation enormous. There are many ways to compress CNN scale, however, the compressed CNN usually introduces a number of sparse data structures. These sparse data structures can hurt the performance of the CNN on GPU. In order to solve this problem, we adopt the direct sparse convolution algorithm proposed in 2017 to accelerate GPU’s processing of sparse data. According to the characteristics of this algorithm, we transform convolution operation into an inner product of the sparse vector and dense vector on GPU platform. Our optimization makes full use of the sparse data and network structure to allocate threads for task scheduling, and uses data locality to manage memory replacement. It enables the GPU to deal with the operation on the convolution layer efficiently in the sparse CNN. Compared with the cuBLAS, our proposal achieves a speedup of 1.07×~1.23×, 1.17×~3.51×and 1.32×~5.00× on AlexNet, GoogleNet and ResNet respectively. Compared with the cuSPARSE, our method achieves a speedup of 1.31×~1.42×, 1.09×~2.00×and 1.07×~3.22× on AlexNet, GoogleNet, and ResNet respectively.

Key words: convolutional neural network, sparse, parallel, optimization, GPU