• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (07): 1160-1167.

• 高性能计算 • 上一篇    下一篇

分布式训练异构任务调度算法研究

杨坚伟,孟敏,黄家乐,武继刚   

  1. (广东工业大学计算机学院,广东 广州510006)
  • 收稿日期:2020-09-10 修回日期:2020-11-13 接受日期:2021-07-25 出版日期:2021-07-25 发布日期:2021-08-16
  • 基金资助:
    国家自然科学基金(61702114);广东省自然科学基金(2020A1515011361)

Scheduling of heterogeneous tasks for distributed training

YANG Jian-wei,MENG Min,HUANG Jia-le,WU Ji-gang   

  1. (School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China)
  • Received:2020-09-10 Revised:2020-11-13 Accepted:2021-07-25 Online:2021-07-25 Published:2021-08-16

摘要: 分布式机器学习中的工作结点在训练过程中经常需要处理异构任务,但任务发布者可能无法根据有效的先验知识确定边缘服务器集群中哪些是处于训练状态的工作结点。针对边缘服务器集群无法同时满足训练性能与服务质量最大化的问题,对异构任务调度算法进行了研究。首先在集群资源约束下分析了分布式训练收敛性能的影响因素;其次建立了最大化训练性能的优化目标;最后转化为多维多选择背包问题进行求解。仿真结果表明,所提异构任务调度算法能够在保证服务质量的同时,最大化分布式训练性能。

关键词: 分布式训练, 训练性能, 异构任务调度, 多维多选择背包, 收敛分析

Abstract: Workers in distributed machine learning often need to deal with heterogeneous tasks during the training process. However, the task publisher may not be able to determine which workers in the cluster of edge server (ES) are currently in training based on effective prior knowledge. To tackle the problem that the ES cluster cannot fulfill the maximization of the training performance and the quality of service at the same time, a scheduling algorithm of heterogeneous tasks is proposed. Firstly, the factors influencing the convergence performance of distributed training are analyzed under the constraints about cluster’s resources. Secondly, the optimization objective for maximizing training performance is established. Finally, the optimization problem is transformed into a multidimensional multiple-choice knapsack problem. The simulation results show that the proposed scheduling algorithm of heterogeneous tasks can maximize the performance of distributed training and simultaneously ensure the quality of ser- vice. 


Key words: distributed training, training performance, scheduling of heterogeneous tasks, multi- dimensional multiple-choice knapsack problem, convergence analysis