[1] |
Tian W Y,Xue C J,Li M M,et al.Loop fusion and reordering for register file optimization on stream processors [J].Journal of Systems and Software,2012,85(7):1673-1681.
|
[2] |
Liu S, Cui Y Z, Jiang Q ,et al.An efficient tile size selection model based on machine learning[J].Journal of Parallel and Distributed Computing, 2018,121:27-41.
|
[3] |
Alshboul M,Tuck J,Solihin Y.WET:Write efficient loop tiling for non-volatile main memory[C]∥Proc of 2020 57th ACM/IEEE Design Automation Conference,2020:1-6.
|
[4] |
Qiao B,Reiche O,Hannig F,et al.From loop fusion to kernel fusion:A domain-specific approach to locality optimization[C]∥Proc of 2019 IEEE/ACM International Symposium on Code Generation and Optimization,2019:242-253.
|
[5] |
Acharya A,Bondhugula U,Cohen A.Effective loop fusion in polyhedral compilation using fusion conflict graphs[J].ACM Transactions on Architecture and Code Optimization,2020,17(4):1-26.
|
[6] |
Stephenson M,Amarasinghe S.Predicting unroll factors using supervised classification [C]∥Proc of the International Symposium on Code Generation and Optimization,2005:123-134.
|
[7] |
Kisuki T,Knijnenburg P M W,O'Boyle M F P.Combined selection of tile sizes and unroll factors using iterative compilation [C]∥Proc of 2000 International Conference on Parallel Architectures and Compilation Techniques,2000:237-246.
|
[8] |
Pérard-Gayot A,Membarth R,Slusallek P,et al.A data layout transformation for vectorizing compilers[C]∥Proc of the 4th Workshop on Programming Models for SIMD/Vector Processing,2018:1-8.
|
[9] |
Feng Hui, Wang Ya-gang. Using graph neural networks to enhance compiler code vectorization heuristics[J]. Application Research of Computers,2021, 38(8):2349-2353.(in Chinese)
|
[10] |
Wolf M E,Lam M S.A loop transformation theory and an algorithm to maximize parallelism[J].IEEE Transactions on Parallel & Distributed Systems,1991,2(4):452-471.
|
[11] |
Prema S,Nasre R,Jehadeesan R,et al.A study on popular auto-parallelization frameworks[J].Concurrency and Computation:Practice and Experience,2019,31(17):e5168.
|
[12] |
Feautrier P, Lengauer C.Polyhedron model[M].Boston:Springer, 2011:1581-1592.
|
[13] |
Bondhugula U,Hartono A,Ramanujam J,et al.A practical automatic polyhedral parallelizer and locality optimizer[C]∥Proc of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation,2008:101-113.
|
[14] |
Zhao Jie, Li Ying-ying, Zhao Rong-cai. Black magic of polyhedral compolation[J].Journal of Software, 2018, 29(8):2371-2396.(in Chinese)
|
[15] |
Narasimhan K,Acharya A,Baid A,et al.A practical tile size selection model for affine loop nests[C]∥Proc of the ACM International Conference on Supercomputing,2021:27-39.
|
[16] |
Kelefouras V, Djemame K, Keramidas G, et al. A metho- dology for efficient tile size selection for affine loop kernels[J].International Journal of Parallel Programming,2022,50(3-4):405-432.
|
[17] |
Liu Song, Wu Wei-guo, Zhao Bo, et al. Loop tiling for optimization of locality and parallelism[J]. Journal of Computer Research and Development, 2015, 52(5):1160-1176. (in Chinese)
|
[18] |
Zhou X,Garzarán M J,Padua D A.Optimal parallelogram selection for hierarchical tiling[J].ACM Transactions on Architecture and Code Optimization,2015,11(4):Article No.: 58.
|
[19] |
Li Ying-ying, Zhao Jie,Pang Jian-min. Split tiling design and implementation in the polyhedral model [J]. Chinese Journal of Computers, 2020, 43(6):1038-1051. (in Chinese)
|
[20] |
Chi Hao-yu, Chen Chang-bo. Survey on automatic tuning of compilers by machine learing[J]. Computer Science, 2022, 49(1):241-251.(in Chinese)
|
[21] |
Sato Y, Yuki T, Endo T.An autotuning framework for scalable execution of tiled code via iterative polyhedral compilation[J].ACM Transactions on Architecture and Code Optimization,2019,15(4):Article No.: 67.
|
[22] |
Brauckmann A,Goens A,Castrillon J.PolyGym:Polyhedral optimizations as an environment for reinforcement learning[C]∥Proc of 2021 30th International Conference on Parallel Architectures and Compilation Techniques,2021:17-29.
|
[23] |
Baghdadi R,Ray J,Romdhane M B,et al.Tiramisu:A polyhedral compiler for expressing fast and portable code[C]∥Proc of 2019 IEEE/ACM International Symposium on Code Generation and Optimization,2019:193-205.
|
[24] |
Tavarageri S,Heinecke A,Avancha S,et al.PolyDL:Polyhedral optimizations for creation of high-performance DL primitives[J].ACM Transactions on Architecture and Code Optimization,2021,18(1):Article No.: 11.
|
[25] |
Zhao J,Li B,Nie W,et al.AKG:Automatic kernel generation for neural processing units using polyhedral transformations[C]∥Proc of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation,2021:1233-1248.
|
[26] |
Liu S,Cui Y,Jiang Q,et al.An efficient tile size selection model based on machine learning[J].Journal of Parallel and Distributed Computing,2018,121:27-41.
|
[27] |
Chi Hao-yu,Chen Chang-bo. Prediction of loop tiling size based on neural network [J]. Computer Science, 2020,47(8):62-70. (in Chinese)
|
[28] |
Herruzo E,Bandera G,Plata O G,et al.Reducing cache misses by loop reordering[C]∥Proc of International Conference on Parallel Computing: Current & Future Issues of High-End Computing,2005:541-548.
|
[29] |
Li M, Liu Y,Liu X,et al.The deep learning compiler:A comprehensive survey[J].IEEE Transactions on Parallel and Distributed Systems,2020,32(3):708-727.
|
[30] |
Chen T Q,Moreau T,Jiang Z H,et al.TVM:An automated end-to-end optimizing compiler for deep learning[C]∥Proc of the 13th USENIX Conference on Operating Systems Design and Implementation,2018:578-594.
|
[31] |
Fegade P,Chen T,Gibbons P,et al.Cortex:A compiler for recursive deep learning models[C]∥Proc of the 4th Machine Learning and Systems Conference,2021:38-54.
|
[32] |
Ma L X,Xie Z Q,Yang Z,et al.RAMMER:Enabling holistic deep learning compiler optimizations with rTasks[C]∥Proc of the 14th USENIX Conference on Operating Systems Design and Implementation,2020:881-897.
|
[33] |
Yuki T.Understanding PolyBench/C 3.2 kernels[C]∥Proc of International Workshop on Polyhedral Compilation Techniques,2014:1-5.
|
[34] |
Chen T Q,Guestrin C.XGBoost:A scalable tree boosting system[C]∥Proc of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2016:785-794.
|
|
附中文参考文献:
|
[9] |
冯晖,王亚刚.基于深度图网络的编译器向量化启发式算法[J].计算机应用研究,2021,38(8):2349-2353.
|
[14] |
赵捷,李颖颖,赵荣彩.基于多面体模型的编译“黑魔法”[J].软件学报,2018,29(8):2371-2396.
|
[17] |
刘松,伍卫国,赵博,等.面向局部性和并行优化的循环分块技术[J].计算机研究与发展,2015,52(5):1160-1176.
|
[19] |
李颖颖,赵捷,庞建民.多面体模型中分裂分块算法的设计与实现[J].计算机学报,2020,43(6):1038-1051.
|
[20] |
池昊宇,陈长波.基于机器学习的编译器自动调优综述[J].计算机科学,2022,49(1):241-251.
|
[27] |
池昊宇,陈长波.基于神经网络的循环分块大小预测[J].计算机科学,2020,47(8):62-70.
|