Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (01): 1-11.
• High Performance Computing • Previous Articles Next Articles
SUN Qing-xiao,LIU Yi,YANG Hai-long,WANG Yi-qing,JIA Jie,LUAN Zhong-zhi,QIAN De-pei
Received:
2022-12-28
Revised:
2023-03-04
Accepted:
2024-01-25
Online:
2024-01-25
Published:
2024-01-15
SUN Qing-xiao, LIU Yi, YANG Hai-long, WANG Yi-qing, JIA Jie, LUAN Zhong-zhi, QIAN De-pei. GNNSched: A GNN inference task scheduling framework on GPU[J]. Computer Engineering & Science, 2024, 46(01): 1-11.
[1] | Xiao W C,Bhardwaj R,Ramjee R,et al.Gandiva: Introspective cluster scheduling for deep learning[C]∥Proc of the 13th USENIX Conference on Operating Systems Design and Implementation,2018: 595-610. |
[2] | Wu Z H,Pan S R,Chen F W,et al.A comprehensive survey on graph neural networks[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(1): 4-24. |
[3] | Huang K Z, Zhai J D,Zheng Z,et al.Understanding and bridging the gaps in current GNN performance optimizations[C]∥Proc of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,2021: 119-132. |
[4] | Fey M,Lenssen J E.Fast graph representation learning with PyTorch geometric[J].arXiv:1903.02428,2019. |
[5] | Wang M J, Zheng D,Ye Z H,et al.Deep graph library: A graph-centric,highly-performant package for graph neural networks[J].arXiv:1909.01315,2019. |
[6] | Xiao W C,Ren S R,Li Y,et al.AntMan: Dynamic scaling on GPU clusters for deep learning[C]∥Proc of the 14th USENIX Conference on Operating Systems Design and Implementation,2020: 533-548. |
[7] | Wu X F,Rao J,Chen W,et al.SwitchFlow: Preemptive multitasking for deep learning[C]∥Proc of the 22nd International Middleware Conference,2021: 146-158. |
[8] | Sun Q X,Liu Y,Yang H L,et al.QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU[J].Parallel Computing,2022,113(C):102958. |
[9] | Bai Z H,Zhang Z,Zhu Y B,et al.PipeSwitch: Fast pipelined context switching for deep learning applications[C]∥Proc of the 14th USENIX Conference on Operating Systems Design and Implementation,2020: 499-514. |
[10] | Han M C,Zhang H Z,Chen R,et al.Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences[C]∥Proc of the 16th USENIX Conference on Operating Systems Design and Implementation,2022: 539-558. |
[11] | Cui W H,Zhao H,Chen Q,et al.Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction[C]∥Proc of the International Conference for High Performance Computing,Networking,Storage and Analysis,2021:Article No.:15. |
[12] | Choi S,Lee S,Kim Y,et al.Serving heterogeneous machine learning models on multi-GPU servers with spatio-temporal sharing[C]∥Proc of USENIX Annual Technical Confe- rence,2022: 199-216. |
[13] | Wang Y K,Feng B Y,Li G S,et al.GNNAdvisor: An adaptive and efficient runtime system for GNN acceleration on GPUs[C]∥Proc of the 15th USENIX Conference on Ope- rating Systems Design and Implementation,2021: 515-531. |
[14] | Dhakal A, Kulkarni S G,Ramakrishnan K K.GSLICE: Controlled spatial sharing of GPUs for a scalable inference platform[C]∥Proc of the 11th ACM Symposium on Cloud Computing,2020: 492-506. |
[15] | Sun Q X,Liu Y,Yang H L,et al.CoGNN:Efficient schedul- ing for concurrent GNN training on GPUs[C]∥Proc of the International Conference on High Performance Computing,Networking,Storage and Analysis,2022: 1-15. |
[16] | Peng Y H,Bao Y X,Chen Y R,et al.Optimus: An efficient dynamic resource scheduler for deep learning clusters[C]∥Proc of the 13th European Conference on Computer Systems,2018: 1-14. |
[17] | Kipf T N, Welling M.Semi-supervised classification with graph convolutional networks[J].arXiv:1609.02907,2016. |
[18] | Hamilton W, Rex Y,Leskovec J.Inductive representation learning on large graphs[C]∥Proc of the 31st International Conference on Neural Information Processing Systems,2017: 1025-1035. |
[19] | Xu K,Hu W H,Leskovec J,et al.How powerful are graph neural networks[J].arXiv:1810.00826,2018. |
[20] | Li J J, Louri A,Karanth A,et al.GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks[C]∥Proc of 2021 IEEE International Symposium on High-Performance Computer Architecture,2021: 775-788. |
[21] | Shen H C,Chen L Q,Jin Y C,et al.Nexus: A GPU cluster engine for accelerating DNN-based video analysis[C]∥Proc of the 27th ACM Symposium on Operating Systems Principles,2019: 322-337. |
[22] | Gao Y J,Liu Y,Zhang H Y,et al.Estimating GPU memory consumption of deep learning models[C]∥Proc of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering,2020: 1342-1352. |
[23] | PyTorch: The topological sorting algorithm for computation graphs in PyTorch[EB/OL].[2021-11-12]. https://github.com/pytorch/pytorch/blob/v1.8.0/caffe2/core/nomnigraph/include/nomnigraph/Graph/TopoSort.h. |
[24] | Gu J C,Chowdhury M,Shin K G,et al.Tiresias: A GPU cluster manager for distributed deep learning[C]∥Proc of the 16th USENIX Conference on Networked Systems Design and Implementation,2019: 485-500. |
[25] | Hu Q H,Sun P,Yan S G,et al.Characterization and prediction of deep learning workloads in large-scale GPU datacenters[C]∥Proc of the International Conference for High Performance Computing,Networking,Storage and Analysis,2021:Article No.:104. |
[26] | Paszke A,Gross S,Massa F,et al.PyTorch: An imperative style,high-performance deep learning library[C]∥Proc of the 33rd International Conference on Neural Information Processing Systems,2019:8026-8037. |
[27] | Hu Y W,Ye Z H,Wang M J,et al.FeatGraph: A flexible and efficient backend for graph neural network systems[C]∥Proc of the International Conference for High Performance Computing,Networking,Storage and Analysis,2020:Article No.:71. |
[28] | Wang Y K,Feng B Y,Ding Y F.QGTC: Accelerating quantized graph neural networks via GPU Tensor core[C]∥Proc of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,2022: 107-119. |
[29] | Crankshaw D, Wang X, Zhou G, et al. Clipper: A low- latency online prediction serving system[C]∥Proc of the 14th USENIX Conference on Networked Systems Design and Implementation,2017: 613-627. |
[30] | Zhang C L,Yu M C,Wang W,et al.MArk: Exploiting cloud services for cost-effective,SLO-aware machine learning inference serving[C]∥Proc of USENIX Annual Technical Conference,2019: 1049-1062. |
[31] | Gujarati A, Karimi R, Alzayat S, et al. Serving DNNs like clockwork: Performance predictability from the bottom up[C]∥Proc of USENIX Symposium on Operating Systems Design and Implementation, 2020:443-462. |
[1] | WEN Rui-lin, FAN Chun, MA Yin-ping, WANG Zheng-dan, XIANG Guang-yu, FU Zhen-xin. SlurmX:A task scheduling system refactored from Slurm using object oriented methodology [J]. Computer Engineering & Science, 2022, 44(09): 1532-1541. |
[2] | LI Wen-jia, SHI Lan, JI Hang-xu, LUO Yi-peng. Research and implementation of a Flink-oriented load balancing task scheduling algorithm [J]. Computer Engineering & Science, 2022, 44(07): 1141-1151. |
[3] | LUO Lei, CHEN Zhao-yun, WANG Li-xuan. User QoS-aware deep learning task dynamic scheduling on GPU clusters [J]. Computer Engineering & Science, 2021, 43(08): 1331-1340. |
[4] | HUANG Shan, , FANG Liu-yi, , XU Hao-tong, DUAN Xiao-dong, . Task scheduling optimization of Flink in container environment [J]. Computer Engineering & Science, 2021, 43(07): 1173-1184. |
[5] | XING Hong-xing, WEI Ye-hua, LE Yi. A hardware cost reduction scheduling algorithm of heterogeneous distributed embedded system [J]. Computer Engineering & Science, 2021, 43(02): 258-265. |
[6] |
HU Ya-hong1,SHENG Xia2,Mao Jia-fa1.
Task scheduling optimization in Spark
environment with unbalanced resources
|
[7] |
ZHU Yong-chao1,ZHOU Chuan1,CUI Yu-wei2,GUO Jian1,WU Yi-fei1.
An improved primary/backup scheduling algorithm
based on simulated annealing algorithm
[J]. Computer Engineering & Science, 2019, 41(09): 1534-1540.
|
[8] |
WANG Yu-xin,WANG Fei,WANG Guan,GUO He.
A MapReduce workflow heterogeneous scheduling
algorithm based on two-level DAG model
|
[9] |
JI Hui,ZHOU Lei.
A task scheduling method for
network-on-chip temperature optimization
[J]. Computer Engineering & Science, 2018, 40(09): 1527-1533.
|
[10] |
TONG Zhao1,2,CHEN Hong-jian1,2,CHEN Ming1,2,MEI Jing1,2,LIU Hong1,2.
A hybrid biogeography-based optimization
algorithm for task scheduling in cloud computing
|
[11] |
HE Zhi-ming,LIU Min.
A fair resource allocation strategy based on
preference in cloud environment
|
[12] | MO Wen-dao1,LI Ye-da2,WEN Ang-zhan3,LIN Wei-wei3. A temperatureaware task scheduling algorithm for mobile devices [J]. Computer Engineering & Science, 2017, 39(04): 627-633. |
[13] |
GUO Hui-yun1,2,FANG Jun1,2,LI Dong1,2.
A multi-source streaming data real-time
storage system based on load balance
[J]. Computer Engineering & Science, 2017, 39(04): 641-647.
|
[14] |
CHEN Wanghu,DUAN Ju,YU Maoyi.
A scheduling policy of scientific workflows allowing
the violation of local time constraints
[J]. Computer Engineering & Science, 2016, 38(11): 2165-2171.
|
[15] |
DU Jiayi,LI Renfa,DU Linna.
Task optimization scheduling to inter-connection network on embedded system with chip multi-processors [J]. J4, 2016, 38(04): 617-623. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
湘公网安备 43010502000083号
湘ICP备10006030号
Copyright © Computer Engineering & Science, All Rights Reserved.
Address:109 Deya Rd,Changsha,hunan(410073) Tel: 0731-87002567 Email: jsjgcykx@vip.163.com
Powered by Beijing Magtech Co., Ltd.