• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (12): 2111-2119.

• 高性能计算 • 上一篇    下一篇

基于指令Cache和寄存器压力的循环展开优化

王翠霞,韩林,刘浩浩   

  1. (中原工学院前沿信息技术研究院,河南 郑州 450007)

  • 收稿日期:2021-11-25 修回日期:2022-01-30 接受日期:2022-12-25 出版日期:2022-12-25 发布日期:2022-12-26

Optimization of loop unrolling based on instruction Cache and register pressure

WANG Cui-xia,HAN Lin,LIU Hao-hao   

  1. (Research Institute of Front Information Technology,Zhongyuan University of Technology,Zhengzhou 450007,China)

  • Received:2021-11-25 Revised:2022-01-30 Accepted:2022-12-25 Online:2022-12-25 Published:2022-12-26

摘要: 循环展开是一种常用的编译优化技术,能够有效减少循环开销,提升指令级并行程度和数据局部性,提升循环的执行效能。然而,过度的循环展开会造成指令Cache溢出,增大寄存器压力,循环展开次数太少又会浪费潜在的性能提升机会,因此寻找恰当的展开因子是研究循环展开问题的核心。基于GCC开源编译器,面向循环展开问题开展深入的分析与研究,针对指令Cache和寄存器资源对循环展开的影响,提出了一种基于指令Cache和寄存器压力的循环展开因子计算方法,并在GCC编译器中实现了该计算方法。申威和海光平台上的实验结果显示,相较于目前GCC中存在的其它展开因子计算方法,所提出的方法可以获得更为有效的循环展开因子,提升了程序性能。在SPEC CPU 2006测试集上的平均性能分别提升了2.7%和3.1%,在NPB-3.3.1测试集上的分别为5.4%和6.1%。

关键词: 编译优化, 循环展开, 展开因子, 指令Cache, 寄存器压力

Abstract: Loop unrolling is a common compiler optimization technique, which can effectively reduce loop overhead, improve instruction-level parallelism and register locality, and improve the execution efficiency of loop. However, excessive loop unrolling will cause instruction Cache overflow and increase register pressure, and too little loop unrolling will waste potential performance improvement opportunities. Therefore, finding an appropriate unroll factor is the core of the study of loop unrolling. Based on the open-source compiler GCC, the loop unrolling problems are deeply analyzed and studied. In view of the influence of instruction Cache and register resources on the loop unrolling, a loop unrolling factor calculation method based on instruction Cache and register pressure is proposed and implemented in GCC compiler. Experiments on Sunway and Hygon platforms show that, compared with the current loop unrolling factor calculation method in GCC, this method can obtain more effective unrolling factor and improve the program performance. The average performance of the SPEC CPU 2006 is increased by 2.7% and 3.1%, respectively, and NPB-3.3.1 is increased by 5.4% and 6.1%. 

Key words: compiler optimization, loop unrolling, unrolling factor, instruction Cache, register pressure