Parallel optimization for deep learning
based on HPC environment

Abstract

Abstract:

Deep learning technology has been widely applied for various purposes, especially big data analysis. However, computation demand for deep learning is getting more complex and larger. In order to accelerate the training of largescale deep networks, various distributed parallel training protocols have been proposed. We design a novel asynchronous training protocol, called weighted asynchronous parallel protocol (WASP), to update neural network parameters in a more effective way. The core of WASP is how to deal with “gradient staleness”, a parameter version number based metric to weight gradients and reduce the influence of the stale gradient on parameters. Moreover, by periodic forced synchronization of model parameters, the WASP combines the advantages of synchronous and asynchronous training models and can speed up training with a rapid convergence rate. We conduct experiments on the Tianhe2 supercomputing system using two classical convolutional neural networks, LeNet5 and ResNet101, and the results show that the WASP can achieve a much higher speedup and a more stable convergence than existing asynchronous parallel training protocols.

Key words: deep learning, distributed parallelization, Tianhe2, parameter server, staleness

CHEN Mengqiang，YAN Zijie，YE Yan，WU Weigang.

Parallel optimization for deep learning

based on HPC environment

[J]. Computer Engineering & Science.

[1]	WU Yuhong, WANG Jian. Fault diagnosis of analog circuits based on Patches-CNN [J]. Computer Engineering & Science, 2025, 47(01): 35-44.
[2]	XU Chao, RUAN Rongyao, CHEN Yong, . A blockchain-based medical data auditing method [J]. Computer Engineering & Science, 2025, 47(01): 95-106.
[3]	CHEN Xinran, LIU Ning, YAN Zhongmin, LIU Lei, CUI Lizhen. An attention-guided dual-granularity cross-modal medical representation learning framework [J]. Computer Engineering & Science, 2025, 47(01): 150-159.
[4]	LUO Jing, YE Zhi-sheng, YANG Ze-hua, FU Tian-hao, WEI Xiong, WANG Xiao-lin, LUO Ying-wei, . Constructing and analyzing deep learning task dataset for R&D GPU clusters [J]. Computer Engineering & Science, 2024, 46(12): 2128-2137.
[5]	JING Chao, BI Yu-shen. OASIS: An interference-aware online scheduling algorithm for deep learning jobs [J]. Computer Engineering & Science, 2024, 46(12): 2138-2148.
[6]	CHEN Lei, LIANG Zheng-you, SUN Yu, CAI Jun-min. Mobile monocular depth estimation based on multi-scale feature fusion [J]. Computer Engineering & Science, 2024, 46(09): 1616-1524.
[7]	LIU Qiang, LI Mu-chun, WU Xiao-jie, WANG Yu-heng. S-JSMA: A fast JSMA adversarial example generation method with low disturbance redundancy [J]. Computer Engineering & Science, 2024, 46(08): 1395-1402.
[8]	DING Jian-ping, LI Wei-jun, LIU Xue-yang, CHEN Xu. A review of named entity recognition research [J]. Computer Engineering & Science, 2024, 46(07): 1296-1310.
[9]	HU Zhao-hua, WANG Chang-fu, . A small object detection algorithm of remote sensing image based on improved Faster R-CNN [J]. Computer Engineering & Science, 2024, 46(06): 1063-1071.
[10]	TAN Yu-song, WANG Wei, JIAN Song-lei, YI Chao-xiong. Weakly-supervised IDS with abnormal-preserving transformation learning [J]. Computer Engineering & Science, 2024, 46(05): 801-809.
[11]	GAO Shan, LI Shi-jie, CAI Zhi-ping. A survey of Chinese text classification based on deep learning [J]. Computer Engineering & Science, 2024, 46(04): 684-692.
[12]	LUO Yue-tong, LI Chao, ZHOU Bo, ZHANG Yan-kong. An interactive separation method for confusable defects in industrial defect classification [J]. Computer Engineering & Science, 2024, 46(03): 463-470.
[13]	Lv Fu, HAN Xiao-tian, FENG Yong-an, XIANG Liang. A texture image classification method based on adaptive texture feature fusion [J]. Computer Engineering & Science, 2024, 46(03): 488-498.
[14]	JI Xu-rui, WEI De-jian, ZHANG Jun-zhong, ZHANG Shuai, CAO Hui. Research progress on information extraction methods of Chinese electronic medical records [J]. Computer Engineering & Science, 2024, 46(02): 325-337.
[15]	HUANG Ze-biao, DONG De-zun, QI Xing-yun. Gloo+: Accelerating distributed training of deep learning using in-network computing [J]. Computer Engineering & Science, 2024, 46(01): 28-36.

Parallel optimization for deep learning

based on HPC environment

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments