Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (01): 28-36.
• High Performance Computing • Previous Articles Next Articles
HUANG Ze-biao,DONG De-zun,QI Xing-yun
Received:
Revised:
Accepted:
Online:
Published:
Abstract: In distributed deep learning training, collective communication is the main communication method. In the research of collective communication optimization, there are software-level optimization and hardware-level optimization. SHARP is a collective communication network offload protocol proposed by Mellanox. It is optimized for collective communication in hardware. It offloads collective ope- rations to switches in the network, thereby shortening the collective communication time. We integrated SHARP technology on the basis of Gloo, and designed and implemented a collective communication library-Gloo+ that can accelerate distributed deep learning training by using in-network computing. Our experimental evaluation of Gloo+ shows that in the benchmark test, when the message size is small, the acceleration ratio of Gloo+ relative to Gloo can reach up to 100 or more. While compared to MPI in Ethernet mode, the acceleration ratio can also reach up to 50 or more. While compared to MPI in IB mode, the acceleration ratio is within 10. In the practical application of distributed deep learning training, the acceleration ratio of Gloo+ can reach a maximum of 1.1 compared to Gloo, 1.3 compared to MPI in Ethernet mode, and 0.5 compared to MPI in IB mode.
Key words: distributed deep learning, collective communication, in-network computing, Gloo, SHARP
HUANG Ze-biao, DONG De-zun, QI Xing-yun. Gloo+: Accelerating distributed training of deep learning using in-network computing[J]. Computer Engineering & Science, 2024, 46(01): 28-36.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2024/V46/I01/28