多面体模型下的循环置换与自动调优

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (12): 2121-2134.

多面体模型下的循环置换与自动调优

彭畅1,2,刘青枝1,2,陈长波1,2

（1.中国科学院重庆绿色智能技术研究院，重庆 400714;2.中国科学院大学重庆学院，重庆 400714）

收稿日期:2023-03-06 修回日期:2023-08-22 接受日期:2023-12-25 出版日期:2023-12-25 发布日期:2023-12-14
基金资助:
国家重点研发计划（2020YFA0712300）;重庆英才计划青年拔尖项目 (2021000263);中国科学院“西部之光”；重庆市院士牵头科技创新引导专项(cstc2019yszx-jcyjX0003,cstc2020yszx-jcyjX0005,cstc2021yszx-jcyjX0005)

Loop permutation and auto-tuning under polyhedral model

PENG Chang1,2,LIU Qing-zhi1,2,CHEN Chang-bo1,2

(1Chongqing Institute of Green and Intelligent Technology,Chinese Academy of Sciences,Chongqing 400714;
2..Chongqing School,University of Chinese Academy of Sciences,Chongqing 400714,China)

Received:2023-03-06 Revised:2023-08-22 Accepted:2023-12-25 Online:2023-12-25 Published:2023-12-14

摘要/Abstract

摘要： 针对常用多面体编译器Pluto默认循环调度和分块大小性能欠佳的问题，提出了一种为其调度计算多种合法置换，根据置换和分块大小构成的配置空间为循环程序自动调优的方法。通过对定义循环融合的标量维度的处理，实现了非完美嵌套循环块间和块内的同时置换。构建了4种机器学习驱动的自动调优策略，为循环程序在指定问题规模下寻找优化的置换序和分块大小组合。默认分块大小下，扩展后的Pluto编译器并行环境下生成的最佳置换相较于Pluto的默认调度取得了最高4.02和几何平均2.12的加速比。通过进一步搜索更优的置换序和分块大小组合，最好的自动调优策略在并行环境下相较于Pluto的默认优化取得了最高5.48和几何平均2.86的加速比。此外，指定问题规模下，自动调优得到的最佳配置和学习模型应用于相似问题规模时，相较于Pluto的默认优化也取得了不同程度的性能提升。

关键词: 循环置换, 循环分块, 多面体模型, 自动调优, 机器学习

Abstract: Aiming at improving the performance of the default loop scheduling and tile size of Pluto, a commonly used polyhedral compiler, this paper proposes a method to compute a variety of legal permutations for its default scheduling and auto-tune its performance according to the configuration space composed of permutations and tile sizes. Through the processing of scalar dimension that defines loop fusion, both intra and inter permutations for imperfect loop nest are realized. Four machine learning driven auto-tuning strategies are proposed to find the optimized combination of permutation order and tile size for a loop with a given problem size. Under the default tile size, the optimal permutation gene- rated by the extended Pluto compiler in a parallel environment achieves a maximum speedup of 4.02 and a geometric mean of 2.12 compared with the default scheduling of Pluto. By further searching for a better combination of permutation order and tile size, the best auto-tuning strategy achieves a maximum speedup of 5.48 and a geometric mean of 2.86 compared with Pluto's default optimization in a parallel environment. In addition, the best configuration and the learned model obtained by auto-tuning for a particular problem size, when being applied to similar problem sizes, also outperform the default optimization of Pluto in various degrees.

Key words: loop permutation, loop tiling, polyhedral model, auto-tuning, machine learning

彭畅, 刘青枝, 陈长波, . 多面体模型下的循环置换与自动调优[J]. 计算机工程与科学, 2023, 45(12): 2121-2134.

PENG Chang, LIU Qing-zhi, CHEN Chang-bo, . Loop permutation and auto-tuning under polyhedral model[J]. Computer Engineering & Science, 2023, 45(12): 2121-2134.

编辑推荐

Metrics

阅读次数

全文

528

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	528

来源	本网站	其他网站

次数	402	126
比例	76%	24%

摘要

309

最新录用	在线预览	正式出版

0	0	309

	来源	本网站

	次数	309
	比例	100%

[1]	温鑫, 曾焘, 李春波, 徐子晨. 面向服务器无感计算的模型推理服务切换方法研究[J]. 计算机工程与科学, 2024, 46(07): 1210-1217.
[2]	高珊, 李世杰, 蔡志平. 基于深度学习的中文文本分类综述[J]. 计算机工程与科学, 2024, 46(04): 684-692.
[3]	赵振宇, 杨天豪, 蒋汶乘, 张书政. 基于机器学习的多压多温多参标准单元延迟快速计算方法[J]. 计算机工程与科学, 2023, 45(08): 1331-1338.
[4]	李小玲, 方建滨, 马俊, 谭霜, 谭郁松. 基于监督学习的稀疏矩阵自动任务分配[J]. 计算机工程与科学, 2023, 45(05): 782-789.
[5]	唐阳坤, 鲜港, 杨文祥, 喻杰, 张晓蓉, 王耀彬. 基于用户行为的超级计算机作业失败预测方法[J]. 计算机工程与科学, 2022, 44(10): 1753-1761.
[6]	李文丽. 基于朴素贝叶斯分类的网络谣言识别研究[J]. 计算机工程与科学, 2022, 44(03): 495-501.
[7]	孙成国1，兰静2，姜浩1. 一种基于遗传算法的BLAS库优化方法[J]. 计算机工程与科学, 2018, 40(05): 798-804.