Performance analysis of distributed deep learning communication architecture

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (03): 416-425.

Previous Articles Next Articles

Performance analysis of distributed deep learning communication architecture

ZHANG Li-zhi,RAN Zhe-jiang,LAI Zhi-quan,LIU Feng

(Parallel and Distributed Key Laboratory of National Defense Technology,

Colloge of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

Received:2020-06-11 Revised:2020-07-12 Accepted:2021-03-25 Online:2021-03-25 Published:2021-03-26

Abstract

Abstract: In recent years, advances in deep learning technology have pushed artificial intelligence into a new era of development. However, massive training data and large-scale models have brought increasingly serious challenges to deep learning. Distributed deep learning is an effective method to meet this challenge. An efficient synchronization algorithm is the key to ensuring the performance of distributed deep learning. Aiming at the problem of parallel training of traditional distributed deep learning model synchronization algorithms on large-scale nodes, firstly, the principles and performance of two mainstream parameter communication architectures, centralized Parameter Server and decentralized Ring Allreduce, are analyzed. Secondly, a comparative test environment of two distributed training frameworks was constructed based on TensorFlow on the Tianhe high-performance GPU cluster. Finally, using the Parameter Server architecture as the baseline, the comparative performance of Ring Allreduce architecture for training AlexNet and ResNet-50 in a GPU cluster environment was tested. The experimental results show that, with 32 GPUs, the expansion efficiency of Ring Allreduce architecture can reach 97%. Compared with Parameter Server architecture, it increases the distributed computing performance by 30%, which verifies that Ring Allreduce architecture has better scalability.

Key words: Ring Allreduce, parameter server, distributed training, deep learning, deep neural network

ZHANG Li-zhi, RAN Zhe-jiang, LAI Zhi-quan, LIU Feng. Performance analysis of distributed deep learning communication architecture[J]. Computer Engineering & Science, 2021, 43(03): 416-425.

[1]	DING Jian-ping, LI Wei-jun, LIU Xue-yang, CHEN Xu. A review of named entity recognition research [J]. Computer Engineering & Science, 2024, 46(07): 1296-1310.
[2]	ZHANG Jia-hao, DENG Jin-yi, YIN Shou-yi, WEI Shao-jun, HU Yang. Exploration of the many-core data flow hardware architecture based on Actor model [J]. Computer Engineering & Science, 2024, 46(06): 959-967.
[3]	HU Zhao-hua, WANG Chang-fu, . A small object detection algorithm of remote sensing image based on improved Faster R-CNN [J]. Computer Engineering & Science, 2024, 46(06): 1063-1071.
[4]	TAN Yu-song, WANG Wei, JIAN Song-lei, YI Chao-xiong. Weakly-supervised IDS with abnormal-preserving transformation learning [J]. Computer Engineering & Science, 2024, 46(05): 801-809.
[5]	GAO Shan, LI Shi-jie, CAI Zhi-ping. A survey of Chinese text classification based on deep learning [J]. Computer Engineering & Science, 2024, 46(04): 684-692.
[6]	LUO Yue-tong, LI Chao, ZHOU Bo, ZHANG Yan-kong. An interactive separation method for confusable defects in industrial defect classification [J]. Computer Engineering & Science, 2024, 46(03): 463-470.
[7]	Lv Fu, HAN Xiao-tian, FENG Yong-an, XIANG Liang. A texture image classification method based on adaptive texture feature fusion [J]. Computer Engineering & Science, 2024, 46(03): 488-498.
[8]	JI Xu-rui, WEI De-jian, ZHANG Jun-zhong, ZHANG Shuai, CAO Hui. Research progress on information extraction methods of Chinese electronic medical records [J]. Computer Engineering & Science, 2024, 46(02): 325-337.
[9]	HUANG Ze-biao, DONG De-zun, QI Xing-yun. Gloo+: Accelerating distributed training of deep learning using in-network computing [J]. Computer Engineering & Science, 2024, 46(01): 28-36.
[10]	QIU Xiao-meng, WANG Lin, GU Wen-jun, SONG Wei, TIAN Hao-lai, HU Yu. A time series image semantic segmentation model modified by optical flow [J]. Computer Engineering & Science, 2024, 46(01): 102-110.
[11]	CUI Hao, WAN Ya-ping, ZHONG Hua, NIE Ming-xing, XIAO Yang. Human activity recognition based on LoRa devices [J]. Computer Engineering & Science, 2024, 46(01): 111-121.
[12]	ZHANG Qian, CHEN Zi-qiang, SUN Zong-wei, LAI Jing-an. A fog target detection algorithm fusing high-resolution network [J]. Computer Engineering & Science, 2023, 45(11): 1970-1981.
[13]	LIU Yu-mo, LIU Jian-fei, HAO Lu-guo, ZENG Wen-bin. A multi-scale feature fusion network based fast CU partitioning in HEVC intra coding [J]. Computer Engineering & Science, 2023, 45(11): 1991-1998.
[14]	LI Zhuo-xuan, ZHOU Ya-tong. iSFF-DBNet:An improved text detection algorithm in e-commerce images [J]. Computer Engineering & Science, 2023, 45(11): 2008-2017.
[15]	MA Zhi-feng, ZHANG Hao, LIU Jie. A survey of precipitation nowcasting based on deep learning [J]. Computer Engineering & Science, 2023, 45(10): 1731-1753.

Performance analysis of distributed deep learning communication architecture

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments