• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (01): 1-11.

• 高性能计算 • 上一篇    下一篇

GNNSched:面向GPU的图神经网络推理任务调度框架

孙庆骁,刘轶,杨海龙,王一晴,贾婕,栾钟治,钱德沛   

  1. (北京航空航天大学计算机学院,北京 100191)
  • 收稿日期:2022-12-28 修回日期:2023-03-04 接受日期:2024-01-25 出版日期:2024-01-25 发布日期:2024-01-15
  • 基金资助:
    科技创新2030——“新一代人工智能”重大项目(2022ZD0117805);国家自然科学基金(62072018,62322201,U22A2028);中央高校基本科研业务费专项资金(YWF-23-L-1121)

GNNSched: A GNN inference task scheduling framework on GPU

SUN Qing-xiao,LIU Yi,YANG Hai-long,WANG Yi-qing,JIA Jie,LUAN Zhong-zhi,QIAN De-pei   

  1. (School of Computer Science and Engineering,Beihang University,Beijing 100191,China)
  • Received:2022-12-28 Revised:2023-03-04 Accepted:2024-01-25 Online:2024-01-25 Published:2024-01-15

摘要: 由于频繁的显存访问,图神经网络GNN在GPU上运行时往往资源利用率较低。现有的推理框架由于没有考虑GNN输入的不规则性,直接适用到GNN进行推理任务共置时可能会超出显存容量导致任务失败。对于GNN推理任务,需要根据其输入特点预先分析并发任务的显存占用情况,以确保并发任务在GPU上的成功共置。此外,多租户场景提交的推理任务亟需灵活的调度策略,以满足并发推理任务的服务质量要求。为了解决上述问题,提出了GNNSched,其在GPU上高效管理GNN推理任务的共置运行。具体来说,GNNSched将并发推理任务组织为队列,并在算子粒度上根据成本函数估算每个任务的显存占用情况。GNNSched实现了多种调度策略来生成任务组,这些任务组被迭代地提交到GPU并发执行。实验结果表明,GNNSched能够满足并发GNN推理任务的服务质量并降低推理任务的响应时延。

关键词: 图神经网络, 图形处理器, 推理框架, 任务调度, 估计模型

Abstract: Due to frequent memory access, graph neural network (GNN) often has low resource util- ization when running on GPU. Existing inference frameworks, which do not consider the irregularity of GNN input, may exceed GPU memory capacity when directly applied to GNN inference tasks. For GNN inference tasks, it is necessary to pre-analyze the memory occupation of concurrent tasks based on their input characteristics to ensure successful co-location of concurrent tasks on GPU. In addition, inference tasks submitted in multi-tenant scenarios urgently need flexible scheduling strategies to meet the quality of service requirements for con-current inference tasks. To solve these problems, this paper proposes GNNSched, which efficiently manages the co-location of GNN inference tasks on GPU. Specifically, GNNSched organizes concurrent inference tasks into a queue and estimates the memory occupation of each task based on a cost function at the operator level. GNNSched implements multiple scheduling strategies to generate task groups, which are iteratively submitted to GPU for concurrent execution. Experimental results show that GNNSched can meet the quality of service requirements for concurrent GNN inference tasks and reduce the response time of inference tasks.

Key words: graph neural network (GNN), graphic processing unit (GPU), inference framework, task scheduling, estimation model