• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (01): 42-48.

Previous Articles     Next Articles

A model parallel training optimization algorithm for hybrid heterogeneous platforms 

GAO Kai1,GUO Zhen-hua1,Chen Yong-fang1,Wang Li1,ZHAO Ya-qian1,ZHAO Kun2   

  1. (1.State Key Laboratory of High-End & Storage Technology,Inspur Electronic Information Industry Co.,Ltd.,Jinan 250000;

    2.Guangdong Inspur Big Data Research Co.,Ltd.,Guangzhou 510000,China)


  • Received:2020-04-15 Revised:2020-06-22 Accepted:2021-01-25 Online:2021-01-25 Published:2021-01-22

Abstract: With the development of hybrid heterogeneous platforms, different types of acceleration devices have appeared. How to make full use of these different types of devices in hybrid heterogeneous platforms and how to deploy deep learning models among multiple computing devices to train large and complex models is becoming more and more important. Data parallelism (DP) is the most widely used parallelization strategy, but if the device number in data parallel training continues to grow, the communication overhead between devices will become a bottleneck. In addition, the total amount of batches processed in each step due to the difference in device performance will lead to a loss of accuracy, that is, a larger training period is required to converge to the desired accuracy. These factors will affect the overall training time and will affect the operating efficiency of certain equipment. Except for data parallelism (DP), each training step can be accelerated by model parallelism (MP). This paper proposes a model parallel training optimization algorithm suitable for hybrid heterogeneous platforms. First of all, in order to solve the problem of uneven distribution of device performance in hybrid heterogeneous platforms, this paper proposes a parallel division strategy of mixed hierarchical and channel parallel models. At the same time, it combines some low-performance devices to reduce the length of the pipeline and ease the communication pressure. Then, in order to optimize the pipeline effect between devices, by analyzing the influence of pipeline establishment time and device performance utilization on the overall training time, this paper proposes a micro-batch division method that can balance the two parts. Experiments prove that the proposed model parallel training optimization algorithm has a better speedup than the traditional model parallel algorithm. The training performance speedup on the heterogeneous platform of single type devices is increased by about 4%. The platform's training performance speedup can be increased by about 7% compared to the previous optimization method.




Key words: hybrid heterogeneous, model parallel, micro-batch, device difference