A configurable convolutional neural network
accelerator based on tiling dataflow

Abstract

Abstract:

Convolutional neural networks (CNNs) have been recognized as the best algorithm for deep learning, and they are widely used in image recognition, automatic translation and advertising recommendations. Due to the increasing size of the neural network, the number of the neurons and synapses of the network is also enlarged. Therefore, using specific acceleration hardware to mine the parallelism of CNNs becomes a popular choice. For hardware design, the classic tiling dataflow has achieved high performance. However, the utilization of processing elements of the tiling structure is very low. As deep learning applications demand higher hardware performance, accelerators require higher utilization of processing elements. In order to achieve this goal, we can change the scheduling order to improve the performance, and use parallel input feature graphs and output channels to improve computing parallelism. However, as neural network computation's demand on hardware performance increases, the array size of processing elements inevitably becomes larger and larger. When the array size is increased to a certain extent, a single parallel approach makes utilization gradually to decrease. This requires hardware to develop more neural network parallelism, thereby suppressing element idling. At the same time, in order to adapt to different network structures, configurable operation of hardware arrays on the neural network is required. But configurable hardware can greatly increase hardware overhead and data scheduling difficulty. So, we propose a configurable neural network accelerator based on tiling dataflow. In order to reduce hardware complexity, we propose a partial configuration technique, which can not only improve the utilization of processing elements under large array, but also reduce hardware overhead as much as possible. When the array size of processing elements exceeds 512, the utilization can maintain at an average of 82%~90%. And the accelerator performance is almost linearly proportional to the number of processing elements.

Key words: CNN, tiling dataflow, configurable, parallelism

LI Yihuang，MA Sheng，GUO Yang，CHEN Guilin，XU Rui.

A configurable convolutional neural network

accelerator based on tiling dataflow

[J]. Computer Engineering & Science.

[1]	Lv Fu, HAN Xiao-tian, FENG Yong-an, XIANG Liang. A texture image classification method based on adaptive texture feature fusion [J]. Computer Engineering & Science, 2024, 46(03): 488-498.
[2]	WANG Ji-chang, L Gao-feng, LIU Zhong-pei, YANG Xiang-rui. QUIC encryption and decryption offloading based on data processing unit [J]. Computer Engineering & Science, 2023, 45(11): 1960-1969.
[3]	ZHANG Qian, CHEN Zi-qiang, SUN Zong-wei, LAI Jing-an. A fog target detection algorithm fusing high-resolution network [J]. Computer Engineering & Science, 2023, 45(11): 1970-1981.
[4]	YU Zi-cheng, LING Jie. A DGA domain name detection method based on Transformer and multi-feature fusion [J]. Computer Engineering & Science, 2023, 45(08): 1416-1423.
[5]	LUO Xiao-xia, DENG Yong, YE Ou. A multi-stage adaptive hat detection algorithm in complex scenes [J]. Computer Engineering & Science, 2023, 45(07): 1253-1262.
[6]	LIANG Yi, Turdi Tohti, Askar Hamdulla, . Multi-modal false information detection via multi-layer CNN-based feature fusion and multi-classifier hybrid prediction [J]. Computer Engineering & Science, 2023, 45(06): 1087-1096.
[7]	XU Lang, LI Dai-wei, ZHANG Hai-qing, TANG Dan, HE Lei, YU Xi. Medical text classification based on neural network [J]. Computer Engineering & Science, 2023, 45(06): 1116-1122.
[8]	DENG Shan-shan, HUANG Hui, MA Yan. A small object detection algorithm based on improved Faster R-CNN [J]. Computer Engineering & Science, 2023, 45(05): 869-877.
[9]	LUO Li, SHI Wei, HE Hong-jun, PAN Guo-teng, WANG Lei, GONG Rui. An agile verification method of IO Die [J]. Computer Engineering & Science, 2023, 45(04): 571-576.
[10]	ZHANG Rui-ping, NING Qian, LEI Yin-jie, CHEN Bing-cai. Garbage detection based on Mask R-CNN [J]. Computer Engineering & Science, 2022, 44(11): 2003-2009.
[11]	CAO Ji-jun. Review of reconfigurable optical interconnection network architecture for HPC and DC [J]. Computer Engineering & Science, 2022, 44(06): 951-963.
[12]	ZHANG Ke, LI Tao, XING Li-dong. Microprogram control based on OpenVX parallel processor [J]. Computer Engineering & Science, 2022, 44(03): 403-410.
[13]	WANG Mei, LI Dong-xu, CHEN Lin-lin, FAN Si-meng, XU Chuan-hai, YANG Er-long. An improved Mask RCNN algorithm based on adaptive-threshold non-maximum suppression [J]. Computer Engineering & Science, 2021, 43(10): 1803-1809.
[14]	LI Jing-lin, JIANG Jing-fei, DOU Yong, XU Jin-wei, WEN Dong. A redundacy-reduced candidate box accelerator based on soft-non-maximum suppression [J]. Computer Engineering & Science, 2021, 43(04): 586-593.
[15]	SU Zi-pei, YANG Xin, CHEN Di-hu, SU Tao. A CNN accelerator based on 3D scalable PE array [J]. Computer Engineering & Science, 2021, 43(03): 389-397.

A configurable convolutional neural network

accelerator based on tiling dataflow

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments