CPWS：一种基于检查点的GPGPU多级warp调度器

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (9): 1563-1570.

CPWS：一种基于检查点的GPGPU多级warp调度器

姜泽坤,原博,崔剑峰,黄立波,常俊胜,刘胜

(国防科技大学计算机学院，湖南长沙 410073)

收稿日期:2024-10-23 修回日期:2024-11-01 出版日期:2025-09-25 发布日期:2025-09-22
基金资助:
国家“万人计划”青年拔尖人才支持计划(ZD0202082503)

CPWS: A checkpoint-based multi-level warp scheduler for GPGPU

JIANG Zekun,YUAN Bo,CUI Jianfeng,HUANG Libo,CHANG Junsheng,LIU Sheng

(College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China)

Received:2024-10-23 Revised:2024-11-01 Online:2025-09-25 Published:2025-09-22

摘要/Abstract

摘要： 通用图形处理器(GPGPU)使用单指令多线程(SIMT)模型，该模型允许大量线程同时执行同一指令，从而显著提高计算效率。在SIMT模型中，GPGPU将一组线程组织成名为线程束（warp）的逻辑执行单元。由于硬件必须在多个warp之间进行时分复用，所以warp调度是实现高效并行计算的关键。通过添加新的检查点指令，设计并实现了一种基于检查点的多级warp调度器CPWS。CPWS能够跟踪每个warp的执行进度，并根据该进度动态调整其调度策略，整体硬件开销较低。实验表明，CPWS的性能与贪婪调度器(GTO)的相比提高了11%，与松散轮询调度(LRR)的相比提高了16.7%，与两级轮询的相比提高了10.6%。此外，通过在FPGA上的综合结果表明，CPWS相比GTO增加的逻辑单元开销仅为0.8%。

关键词: 通用图形处理器, 检查点, 线程束调度器

Abstract: General-purpose graphics processing unit (GPGPU) adopts the single instruction multiple- thread (SIMT) model, which allows a large number of threads to execute the same instruction simultaneously, thereby significantly improving computing efficiency. Under the SIMT model, GPGPUs organize a group of threads into logical execution units called warps. Since hardware must perform time-division multiplexing among multiple warps, warp scheduling is crucial for achieving efficient parallel computing. By adding new checkpoint instructions, a checkpoint-based multi-level warp scheduler (CPWS) is introduced. CPWS can track the execution progress of each warp and dynamically adjust its scheduling strategy based on this progress, with relatively low overall hardware overhead. Experimental results show that CPWS improves performance by 11% compared with the greedy then oldest (GTO) scheduler, 16.7% compared with the loose round robin (LRR) scheduler, and 10.6% compared with the two-level round robin scheduler. In addition, synthesis results on FPGA indicate that the logic unit overhead added by CPWS compared with GTO is only 0.8%.

Key words: general-purpose graphics processing unit(GPGPU), checkpoint, warp scheduler

姜泽坤, 原博, 崔剑峰, 黄立波, 常俊胜, 刘胜. CPWS：一种基于检查点的GPGPU多级warp调度器[J]. 计算机工程与科学, 2025, 47(9): 1563-1570.

JIANG Zekun, YUAN Bo, CUI Jianfeng, HUANG Libo, CHANG Junsheng, LIU Sheng. CPWS: A checkpoint-based multi-level warp scheduler for GPGPU[J]. Computer Engineering & Science, 2025, 47(9): 1563-1570.

[1]	韦中伟1，陈海涛2，王强2，沈志宇2. 支持数据库访问的进程检查点技术研究与实现[J]. J4, 2011, 33(8): 84-88.
[2]	贾佳. 异构系统的异步应用级Checkpointing技术[J]. J4, 2011, 33(11): 54-59.
[3]	张靓[1] 刘光明[2]. RTEMS嵌入式系统中的软件容错设计[J]. J4, 2007, 29(5): 147-151.
[4]	张庆成金海张浩. MPI程序容错系统的分析和设计[J]. J4, 2005, 27(6): 89-92.
[5]	周军海[1] 张大方[2] 杨金民[3]. 改进的快速N＋1奇偶校验检查点 [J]. J4, 2005, 27(4): 11-13.
[6]	罗元盛[1] 闵应骅[2] 张大方[1]. 基于索引的准同步检查点的重新计时策略[J]. J4, 2005, 27(4): 8-10.
[7]	周国峰谢长生姚杰. 进程检查点技术的改进[J]. J4, 2004, 26(12): 88-90.
[8]	谢宝湘金士尧等. 实时双机系统中检查点设置周期的选择[J]. J4, 2001, 23(1): 90-92.