[1] |
Xiao W C,Bhardwaj R,Ramjee R,et al.Gandiva: Introspective cluster scheduling for deep learning[C]∥Proc of the 13th USENIX Conference on Operating Systems Design and Implementation,2018: 595-610.
|
[2] |
Wu Z H,Pan S R,Chen F W,et al.A comprehensive survey on graph neural networks[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(1): 4-24.
|
[3] |
Huang K Z, Zhai J D,Zheng Z,et al.Understanding and bridging the gaps in current GNN performance optimizations[C]∥Proc of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,2021: 119-132.
|
[4] |
Fey M,Lenssen J E.Fast graph representation learning with PyTorch geometric[J].arXiv:1903.02428,2019.
|
[5] |
Wang M J, Zheng D,Ye Z H,et al.Deep graph library: A graph-centric,highly-performant package for graph neural networks[J].arXiv:1909.01315,2019.
|
[6] |
Xiao W C,Ren S R,Li Y,et al.AntMan: Dynamic scaling on GPU clusters for deep learning[C]∥Proc of the 14th USENIX Conference on Operating Systems Design and Implementation,2020: 533-548.
|
[7] |
Wu X F,Rao J,Chen W,et al.SwitchFlow: Preemptive multitasking for deep learning[C]∥Proc of the 22nd International Middleware Conference,2021: 146-158.
|
[8] |
Sun Q X,Liu Y,Yang H L,et al.QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU[J].Parallel Computing,2022,113(C):102958.
|
[9] |
Bai Z H,Zhang Z,Zhu Y B,et al.PipeSwitch: Fast pipelined context switching for deep learning applications[C]∥Proc of the 14th USENIX Conference on Operating Systems Design and Implementation,2020: 499-514.
|
[10] |
Han M C,Zhang H Z,Chen R,et al.Microsecond-scale preemption for concurrent GPU-accelerated DNN inferences[C]∥Proc of the 16th USENIX Conference on Operating Systems Design and Implementation,2022: 539-558.
|
[11] |
Cui W H,Zhao H,Chen Q,et al.Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction[C]∥Proc of the International Conference for High Performance Computing,Networking,Storage and Analysis,2021:Article No.:15.
|
[12] |
Choi S,Lee S,Kim Y,et al.Serving heterogeneous machine learning models on multi-GPU servers with spatio-temporal sharing[C]∥Proc of USENIX Annual Technical Confe- rence,2022: 199-216.
|
[13] |
Wang Y K,Feng B Y,Li G S,et al.GNNAdvisor: An adaptive and efficient runtime system for GNN acceleration on GPUs[C]∥Proc of the 15th USENIX Conference on Ope- rating Systems Design and Implementation,2021: 515-531.
|
[14] |
Dhakal A, Kulkarni S G,Ramakrishnan K K.GSLICE: Controlled spatial sharing of GPUs for a scalable inference platform[C]∥Proc of the 11th ACM Symposium on Cloud Computing,2020: 492-506.
|
[15] |
Sun Q X,Liu Y,Yang H L,et al.CoGNN:Efficient schedul- ing for concurrent GNN training on GPUs[C]∥Proc of the International Conference on High Performance Computing,Networking,Storage and Analysis,2022: 1-15.
|
[16] |
Peng Y H,Bao Y X,Chen Y R,et al.Optimus: An efficient dynamic resource scheduler for deep learning clusters[C]∥Proc of the 13th European Conference on Computer Systems,2018: 1-14.
|
[17] |
Kipf T N, Welling M.Semi-supervised classification with graph convolutional networks[J].arXiv:1609.02907,2016.
|
[18] |
Hamilton W, Rex Y,Leskovec J.Inductive representation learning on large graphs[C]∥Proc of the 31st International Conference on Neural Information Processing Systems,2017: 1025-1035.
|
[19] |
Xu K,Hu W H,Leskovec J,et al.How powerful are graph neural networks[J].arXiv:1810.00826,2018.
|
[20] |
Li J J, Louri A,Karanth A,et al.GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks[C]∥Proc of 2021 IEEE International Symposium on High-Performance Computer Architecture,2021: 775-788.
|
[21] |
Shen H C,Chen L Q,Jin Y C,et al.Nexus: A GPU cluster engine for accelerating DNN-based video analysis[C]∥Proc of the 27th ACM Symposium on Operating Systems Principles,2019: 322-337.
|
[22] |
Gao Y J,Liu Y,Zhang H Y,et al.Estimating GPU memory consumption of deep learning models[C]∥Proc of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering,2020: 1342-1352.
|
[23] |
PyTorch: The topological sorting algorithm for computation graphs in PyTorch[EB/OL].[2021-11-12]. https://github.com/pytorch/pytorch/blob/v1.8.0/caffe2/core/nomnigraph/include/nomnigraph/Graph/TopoSort.h.
|
[24] |
Gu J C,Chowdhury M,Shin K G,et al.Tiresias: A GPU cluster manager for distributed deep learning[C]∥Proc of the 16th USENIX Conference on Networked Systems Design and Implementation,2019: 485-500.
|
[25] |
Hu Q H,Sun P,Yan S G,et al.Characterization and prediction of deep learning workloads in large-scale GPU datacenters[C]∥Proc of the International Conference for High Performance Computing,Networking,Storage and Analysis,2021:Article No.:104.
|
[26] |
Paszke A,Gross S,Massa F,et al.PyTorch: An imperative style,high-performance deep learning library[C]∥Proc of the 33rd International Conference on Neural Information Processing Systems,2019:8026-8037.
|
[27] |
Hu Y W,Ye Z H,Wang M J,et al.FeatGraph: A flexible and efficient backend for graph neural network systems[C]∥Proc of the International Conference for High Performance Computing,Networking,Storage and Analysis,2020:Article No.:71.
|
[28] |
Wang Y K,Feng B Y,Ding Y F.QGTC: Accelerating quantized graph neural networks via GPU Tensor core[C]∥Proc of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,2022: 107-119.
|
[29] |
Crankshaw D, Wang X, Zhou G, et al. Clipper: A low- latency online prediction serving system[C]∥Proc of the 14th USENIX Conference on Networked Systems Design and Implementation,2017: 613-627.
|
[30] |
Zhang C L,Yu M C,Wang W,et al.MArk: Exploiting cloud services for cost-effective,SLO-aware machine learning inference serving[C]∥Proc of USENIX Annual Technical Conference,2019: 1049-1062.
|
[31] |
Gujarati A, Karimi R, Alzayat S, et al. Serving DNNs like clockwork: Performance predictability from the bottom up[C]∥Proc of USENIX Symposium on Operating Systems Design and Implementation, 2020:443-462.
|