• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (09): 1529-1537.

• 高性能计算 • 上一篇    下一篇

面向模型并行训练的模型拆分策略自动生成方法

王丽1,郭振华1,曹芳1,高开1,赵雅倩1,赵坤2   

  1. (1.浪潮电子信息产业股份有限公司高效能服务器与存储技术国家重点实验室,山东 济南 250000;

    2.广东浪潮大数据研究有限公司,广东 广州 510000)
  • 收稿日期:2020-04-08 修回日期:2020-06-11 接受日期:2020-09-25 出版日期:2020-09-25 发布日期:2020-09-24

An automatic model splitting strategy generation method for model parallel training

WANG Li1,GUO Zhen-hua1,CAO Fang1,GAO Kai1,ZHAO Ya-qian1,ZHAO Kun2   

  1. (1.State Key Laboratory of High-End & Storage Technology,Inspur Electronic Information Industry Co.Ltd.,Jinan 250000;

    2.Guangdong Inspur Big Data Research Co.Ltd.,Guangzhou 510000,China)

  • Received:2020-04-08 Revised:2020-06-11 Accepted:2020-09-25 Online:2020-09-25 Published:2020-09-24

摘要: 随着训练数据规模的增大以及训练模型的日趋复杂,深度神经网络的训练成本越来越高,对计算平台提出了更高的算力需求,模型训练并行化成为增强其应用时效性的迫切需求。近年来基于分布式训练的AI加速器(如FPGA、TPU、AI芯片等)层出不穷,为深度神经网络并行训练提供了硬件基础。为了充分利用各种硬件资源,研究人员需要在集合了多种不同算力、不同硬件架构AI加速器的计算平台上进行神经网络的模型并行训练,因此,如何高效利用各种AI加速器计算资源,并实现训练任务在多种加速器上的负载均衡,一直是研究人员关心的热点问题。提出了一种面向模型并行训练的模型拆分策略自动生成方法,该方法能够基于静态的网络模型自动生成模型拆分策略,实现网络层在不同AI加速器上的任务分配。基于该方法自动生成的模型分配策略,能够高效利用单个计算平台上的所有计算资源,并保证模型训练任务在各设备之间的负载均衡,与目前使用的人工拆分策略相比,具有更高的时效性,节省拆分策略生成时间100倍以上,且降低了由于人为因素带来的不确定性。

关键词: 模型并行, 模型训练, 模型拆分, 负载均衡

Abstract: With the increase of the training data scale and the increasing complexity of the model, the training cost of the deep neural network is getting higher and higher, which requires higher computational power for the computing platform. In recent years, AI accelerators (such as FPGA, TPU, AI chip, etc.) based on heterogeneous distributed training have emerged endlessly, providing the hardware foundation for the parallelization of deep neural network. In order to make full use of all kinds of hardware resources, the researchers need to set a variety of different work force and hardware architecture AI accelerator computing platforms for neural network model training. Therefore, in the model paralle- lism training, how to efficient use all sorts of AI accelerator computing resources and realize the training mission in a variety of load balancing on the accelerator is the hot issue researchers concern about. This paper proposes a method that can automatically generate the model splitting strategy based on static network model, and map the model splitting strategy to model training, so as to realize the task assignment of network layers on different AI accelerators. The model allocation strategy automatically generated based on this method can efficiently utilize all computing resources on a single computing platform and ensure the load balancing of model training tasks among various devices. Compared with the current manual splitting strategy, it has higher timeliness, saves the generation time of the splitting strategy by more than 100 times, and reduces the uncertainty caused by human factors.



Key words: model parallelism, model training, model split, load balancing