• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

通过部分Warp重组消除GPGPU控制流的不一致性

沈立,杨耀华,王志英   

  1. (国防科技大学计算机学院,湖南 长沙 410073)
  • 收稿日期:2019-01-19 修回日期:2019-03-23 出版日期:2019-08-25 发布日期:2019-08-25
  • 基金资助:

    国家自然科学基金(61472431)

Eliminating control divergence on
GPGPU via partial warp regrouping

SHEN Li,YANG Yao-hua,WANG Zhi-ying   

  1. (School of Computer,National University of Defense Technology, Changsha 410073,China)
  • Received:2019-01-19 Revised:2019-03-23 Online:2019-08-25 Published:2019-08-25

摘要:

GPU已被广泛应用于当前的高性能计算系统中,但其性能却受到程序运行时不同控制流方向的严重制约。这一问题通常通过动态Warp重组技术来解决,即将一个或多个Warp内沿相同控制流执行的线程组合在一起,构成一个新的 Warp。但是,这类方法普遍存在一些不必要的重组,引入了较大的额外性能开销。分析了线程重组的性能开销,并提出了一种称作“部分重组”的性能优化方法。这种方法在保证重组效率的前提下,避免了对包含活跃线程数量较多的Warp的重组,从而有效减少了线程重组引入的性能开销。测试结果表明,部分重组能够在保证重组效率的前提下带来较为明显的性能提升。
 
 

关键词: GPGPU, 控制流不一致, Warp重组, 框架

Abstract:

GPUs have been widely used in current high-performance computing systems. However, their performance is severely constrained by the different directions of control flow during runtime. In response to this problem, warp regrouping methods are generally applied to combine the threads that execute the same branch path within one or more warps, thus obtaining a new warp. However, some unnecessary reorganization existing in these methods introduces additional performance overheads. We analyze the sources of regrouping overhead and propose a partial warp regrouping approach. Under the premise of ensuring certain efficiency, it reduces the reorganization of warps with a large number of active threads so as to avoid performance overhead. Experimental results indicate that the proposed method can significantly reduce unnecessary overheads while ensuring regrouping efficiency.


 

Key words: GPGPU, control divergence, warp regrouping, framework