• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (05): 782-791.

Previous Articles     Next Articles

Performance evaluation and optimization of distributed and parallel deep neural network on the Tianhe-3 prototype system

WEI Jia,ZHANG Xing-jun,JI Ze-yu,LI Jing-bo,YUE Ying-ying   

  1. (School of Computer Science and Technology,Xi’an Jiaotong University,Xi’an 710127,China)

  • Received:2020-12-08 Revised:2021-02-04 Accepted:2021-05-25 Online:2021-05-25 Published:2021-05-19

Abstract: The Deep Neural Network (DNN) model is an important branch of the Artificial Neural Network (ANN) model and the foundation of deep learning. In recent years, due to the improvement of computer computing power and the development of high-performance computing technology, it has become possible to increase the DNN network depth and the model complexity to improve its feature extraction and data fitting capabilities. As a result, DNN has shown advantages in natural language processing, autonomous driving, face recognition and other issues. However, big data and complex models have greatly increased the training cost of deep neural networks. Therefore, accelerating the training process has become a key task. Its technical scope covers many aspects from the design of the underlying circuit to the design of distributed algorithms. The peak speed of the domestic Tianhe-3 aimed at one quintillion of times, and the huge computing power provides a potential opportunity for DNN training. Based on the characteristics of the ARM architecture of the Tianhe-3 prototype, using the PyTorch framework and MPI technology, this paper conducts a uniquely designed CNN training for a single FT-2000+ computing node, a single MT-2000+ computing node, and the multi-node cluster expanded through them. The performance of the above-mentioned processors in neural network distributed training has been optimized and evaluated, which provides experimental data and theoretical basis for further improving the performance of the Tianhe-3 prototype system in neural network distributed training.



Key words: Tianhe-3 prototype, deep learning, distributed training, performance evaluation, data pa- rallelism