• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (9): 1563-1570.

• 高性能计算 • 上一篇    下一篇

CPWS:一种基于检查点的GPGPU多级warp调度器

姜泽坤,原博,崔剑峰,黄立波,常俊胜,刘胜   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2024-10-23 修回日期:2024-11-01 出版日期:2025-09-25 发布日期:2025-09-22
  • 基金资助:
    国家“万人计划”青年拔尖人才支持计划(ZD0202082503)

CPWS: A checkpoint-based multi-level warp scheduler for GPGPU

JIANG Zekun,YUAN Bo,CUI Jianfeng,HUANG Libo,CHANG Junsheng,LIU Sheng   

  1. (College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)
  • Received:2024-10-23 Revised:2024-11-01 Online:2025-09-25 Published:2025-09-22

摘要: 通用图形处理器(GPGPU)使用单指令多线程(SIMT)模型,该模型允许大量线程同时执行同一指令,从而显著提高计算效率。在SIMT模型中,GPGPU将一组线程组织成名为线程束(warp)的逻辑执行单元。由于硬件必须在多个warp之间进行时分复用,所以warp调度是实现高效并行计算的关键。通过添加新的检查点指令,设计并实现了一种基于检查点的多级warp调度器CPWS。CPWS能够跟踪每个warp的执行进度,并根据该进度动态调整其调度策略,整体硬件开销较低。实验表明,CPWS的性能与贪婪调度器(GTO)的相比提高了11%,与松散轮询调度(LRR)的相比提高了16.7%,与两级轮询的相比提高了10.6%。此外,通过在FPGA上的综合结果表明,CPWS相比GTO增加的逻辑单元开销仅为0.8%。

关键词: 通用图形处理器, 检查点, 线程束调度器

Abstract: General-purpose graphics processing unit (GPGPU) adopts the single instruction multiple- thread (SIMT) model, which allows a large number of threads to execute the same instruction simultaneously, thereby significantly improving computing efficiency. Under the SIMT model, GPGPUs organize a group of threads into logical execution units called warps. Since hardware must perform time-division multiplexing among multiple warps, warp scheduling is crucial for achieving efficient parallel computing. By adding new checkpoint instructions, a checkpoint-based multi-level warp scheduler (CPWS) is introduced. CPWS can track the execution progress of each warp and dynamically adjust its scheduling strategy based on this progress, with relatively low overall hardware overhead. Experimental results show that CPWS improves performance by 11% compared with the greedy then oldest (GTO) scheduler, 16.7% compared with the loose round robin (LRR) scheduler, and 10.6% compared with the two-level round robin scheduler. In addition, synthesis results on FPGA indicate that the logic unit overhead added by CPWS compared with GTO is only 0.8%.


Key words: general-purpose graphics processing unit(GPGPU), checkpoint, warp scheduler