• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (10): 1890-1900.

• 人工智能与数据挖掘 • 上一篇    

基于源代码迁移的编译器优化方法研究

周放,刘茂福,李珊枝   

  1. (1.武汉科技大学计算机科学与技术学院,湖北 武汉 430065;
    2.智能信息处理与实时工业系统湖北省重点实验室,湖北 武汉 430065;3.武汉晴川学院计算机学院,湖北 武汉 430204)

  • 收稿日期:2024-03-01 修回日期:2024-08-14 出版日期:2025-10-25 发布日期:2025-10-29

Research on compiler optimization methods based on source code migration

ZHOU Fang,LIU Maofu,LI Shanzhi   

  1. (1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan 430065;
    2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System,Wuhan 430065;
    3.School of Computer,Wuhan Qingchuan University,Wuhan 430204,China)
  • Received:2024-03-01 Revised:2024-08-14 Online:2025-10-25 Published:2025-10-29

摘要: 编译器优化旨在通过在中间代码IR语言上进行一系列变换,提高代码在目标平台上的运行效率。传统方法通常依赖机器学习来分析IR特征,并预测LLVM编译器优化通道的最佳组合。然而,这些方法因受限于编译器现有优化策略和对全局信息的有限利用,其扩展性受限。采用深度学习自动将函数级IR从未优化状态转换至O2级别优化,并将此优化过程视为翻译任务。通过引入密集数据流图DDFG,能够提取IR代码的全局结构信息,从而引导模型更全面地学习代码语义。使用Transformer模型进行的实验表明,所提方法的模型能在O2级别有效训练IR,且86.15%的函数级优化代码能在保证语义完整性的同时,在编译器上正确执行。


关键词: 编译器优化, 代码翻译, 密集数据流图, 数据流预测

Abstract: Compiler optimization aims to enhance the efficiency of code execution on target platforms by applying a series of transformations to the intermediate representation (IR) language. Traditional methods typically rely on machine learning to analyze IR features and predict the optimal combination of LLVM compiler optimization passes. However, these methods are limited by their reliance on existing compiler optimization strategies and insufficient use of global information, which limits their scalability. This study adopts deep learning to automatically translate function-level IR from an unoptimized state to the O2 optimization level, treating this optimization process as a translation task. By integrating a dense data flow graph (DDFG), this method is able to extract the global structural information from the IR code, thereby guiding the model to learn code semantics more comprehensively. Experiments using the Transformer model demonstrate that this method can effectively train IR at the O2 level, and 86.15% of the function-level optimized code can execute correctly on the compiler while ensuring semantic integrity.

Key words: compiler optimization, code translation, dense data flow graph(DDFG), data flow prediction