• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2024, Vol. 46 ›› Issue (01): 1-11.

• High Performance Computing • Previous Articles     Next Articles

GNNSched: A GNN inference task scheduling framework on GPU

SUN Qing-xiao,LIU Yi,YANG Hai-long,WANG Yi-qing,JIA Jie,LUAN Zhong-zhi,QIAN De-pei   

  1. (School of Computer Science and Engineering,Beihang University,Beijing 100191,China)
  • Received:2022-12-28 Revised:2023-03-04 Accepted:2024-01-25 Online:2024-01-25 Published:2024-01-15

Abstract: Due to frequent memory access, graph neural network (GNN) often has low resource util- ization when running on GPU. Existing inference frameworks, which do not consider the irregularity of GNN input, may exceed GPU memory capacity when directly applied to GNN inference tasks. For GNN inference tasks, it is necessary to pre-analyze the memory occupation of concurrent tasks based on their input characteristics to ensure successful co-location of concurrent tasks on GPU. In addition, inference tasks submitted in multi-tenant scenarios urgently need flexible scheduling strategies to meet the quality of service requirements for con-current inference tasks. To solve these problems, this paper proposes GNNSched, which efficiently manages the co-location of GNN inference tasks on GPU. Specifically, GNNSched organizes concurrent inference tasks into a queue and estimates the memory occupation of each task based on a cost function at the operator level. GNNSched implements multiple scheduling strategies to generate task groups, which are iteratively submitted to GPU for concurrent execution. Experimental results show that GNNSched can meet the quality of service requirements for concurrent GNN inference tasks and reduce the response time of inference tasks.

Key words: graph neural network (GNN), graphic processing unit (GPU), inference framework, task scheduling, estimation model