• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Deep learning parallel optimization mechanism
based on dynamic distribution of training data

YAN Zijie,CHEN Mengqiang,WU Weigang   

  1. (School of Data and Computer Science,Sun Yatsen University,Guangzhou 510006,China)
  • Received:2018-07-13 Revised:2018-09-20 Online:2018-11-26 Published:2018-11-25

Abstract:

To solve the timeconsuming problem of collecting gradient updates under synchronous parallel training, we present a dynamic training data distribution algorithm under parallel synchronization of multiple machines. By calculating the computational efficiency of nodes, the amount of sample data that needs to be processed by nodes is dynamically assigned after each round of iteration. Such a mechanism allows the model to parallelize synchronously and reduce the waiting time it takes for gradient update. Finally, the mechanism is implemented via MXNet and evaluated at Tianhe2 supercomputers. Experimental results show that the proposed optimization mechanism achieves expected results.
 

Key words: deep learning, data assignment, synchronous parallel, parallel training, supercomputing