Scheduling of heterogeneous tasks for distributed training

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (7): 1160-1167.

Previous Articles Next Articles

Scheduling of heterogeneous tasks for distributed training

YANG Jian-wei,MENG Min,HUANG Jia-le,WU Ji-gang

(School of Computer Science and Technology,Guangdong University of Technology,Guangzhou 510006,China）

Received:2020-09-10 Revised:2020-11-13 Online:2021-07-25 Published:2021-08-16

Abstract

Abstract: Workers in distributed machine learning often need to deal with heterogeneous tasks during the training process. However, the task publisher may not be able to determine which workers in the cluster of edge server (ES) are currently in training based on effective prior knowledge. To tackle the problem that the ES cluster cannot fulfill the maximization of the training performance and the quality of service at the same time, a scheduling algorithm of heterogeneous tasks is proposed. Firstly, the factors influencing the convergence performance of distributed training are analyzed under the constraints about cluster’s resources. Secondly, the optimization objective for maximizing training performance is established. Finally, the optimization problem is transformed into a multidimensional multiple-choice knapsack problem. The simulation results show that the proposed scheduling algorithm of heterogeneous tasks can maximize the performance of distributed training and simultaneously ensure the quality of ser- vice.

Key words: distributed training, training performance, scheduling of heterogeneous tasks, multi- dimensional multiple-choice knapsack problem, convergence analysis

YANG Jian-wei, MENG Min, HUANG Jia-le, WU Ji-gang. Scheduling of heterogeneous tasks for distributed training[J]. Computer Engineering & Science, 2021, 43(7): 1160-1167.

[1]	ZHAO Xin-bo, LU Zhong-hua. Research on key technologies of distributed training for Level2 market quotation factor mining [J]. Computer Engineering & Science, 2024, 46(9): 1554-1565.
[2]	ZHANG Jia-hao, DENG Jin-yi, YIN Shou-yi, WEI Shao-jun, HU Yang. Exploration of the many-core data flow hardware architecture based on Actor model [J]. Computer Engineering & Science, 2024, 46(6): 959-967.
[3]	WEI Jia, ZHANG Xing-jun, JI Ze-yu, LI Jing-bo, YUE Ying-ying. Performance evaluation and optimization of distributed and parallel deep neural network on the Tianhe-3 prototype system [J]. Computer Engineering & Science, 2021, 43(5): 782-791.
[4]	ZHANG Li-zhi, RAN Zhe-jiang, LAI Zhi-quan, LIU Feng. Performance analysis of distributed deep learning communication architecture [J]. Computer Engineering & Science, 2021, 43(3): 416-425.

Scheduling of heterogeneous tasks for distributed training

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 4

Recommended Articles

Metrics

Comments