一种基于Inter-warp异构性的缓存管理与内存调度机制

计算机工程与科学

一种基于Inter-warp异构性的缓存管理与内存调度机制

方娟，魏泽琳，于婷雯

（北京工业大学信息学部计算机学院，北京 100124）

收稿日期:2018-10-08 修回日期:2018-12-12 出版日期:2019-05-25
基金资助:
国家自然科学基金（61202076）；北京市自然科学基金（4192007）

Cache management and memory scheduling

based on inter-warp heterogeneity

FANG Juan,WEI Zelin,YU Tingwen

(College of Computer Science,Faculty of Information Science,Beijing University of Technology,Beijing 100124,China)

Received:2018-10-08 Revised:2018-12-12 Online:2019-05-25

摘要/Abstract

摘要：

在GPU中，一个warp内的所有线程在锁步中执行相同的指令。某些线程的内存请求可以得到快速处理，而其余请求会经历较长时间。在最慢的请求完成之前，warp不能执行下一条指令，导致内存发散。对GPU中warp间的异构性进行了研究，实现并优化了一种基于interwarp异构性的缓存管理机制和内存调度策略，以减少内存发散和缓存排队延迟的负面影响。根据缓存命中率将warp分类，以驱动后面的3个组件：（1）基于warp类型的缓存旁路技术组件，使低缓存利用率的warp进入旁路，不访问L2缓存；（2）基于warp类型的缓存插入/提升策略组件，防止来自高缓存利用率warp的数据被过早清除；（3）基于warp类型的内存控制器组件，优先处理从高缓存利用率的warp接收到的请求，并优先处理来自相同warp的请求。基于warp间异构性的缓存管理和内存调度机制在8种不同的GPGPU应用中，与基准GPU相比，平均加速18.0％。

关键词: 缓存管理, 内存调度, 内存发散, 线程束

Abstract:

All threads within a warp execute the same instruction in the lockstep in a GPU. Memory requests from some threads are served early while requests from some other threads have to experience long time latency. Warp cannot execute the next instruction before the last request is served, which can cause memory divergence. We study the inter-warp heterogeneity in GPU, implement and optimize a cache management mechanism and a memory scheduling policy based on interwarp heterogeneity, which can reduce the negative impact of memory divergence and cache queuing latency. Warps are classified according to the hit rate of L2 cache to drive the following three components: (1) A warp-type based cache bypassing mechanism to bypass the L2 cache for warps with low cache utilization; (2) A warptype based cache insert/improvement policy to prevent the data from warps with high cache utilization being cleared prematurely; and (3) A warp-type based memory scheduler to prioritize requests received from warps with high cache utilization and the requests from the same warp. Compared with the baseline GPU, the cache management mechanism and the memory scheduling policy based on inter-warp heterogeneity can speed up 8 different GPGPU applications by 18.0% on average.

Key words: cache management, memory scheduling, memory divergence, warp

方娟，魏泽琳，于婷雯. 一种基于Inter-warp异构性的缓存管理与内存调度机制[J]. 计算机工程与科学.

FANG Juan,WEI Zelin,YU Tingwen.

Cache management and memory scheduling

based on inter-warp heterogeneity

[J]. Computer Engineering & Science.

[1]	陈楚依, 罗雄飞, 鄢宝彤, 冯宇轩, 马可, 乔颖. 面向多层递归域名系统的自适应缓存管理方法[J]. 计算机工程与科学, 2025, 47(5): 823-831.
[2]	张峰，王小明. 机会网络中基于能量消耗的缓存管理策略[J]. J4, 2016, 38(5): 891-897.
[3]	褚瑞谢健聪肖侬卢锡城. 内存网格中的自主协同缓存技术研究[J]. J4, 2008, 30(8): 111-115.
[4]	李栋[1] 赵珑[2] 张有志[1]. 层次型移动IPv6快速切换中的缓存管理[J]. J4, 2007, 29(3): 14-15.