• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (04): 638-645.

• 软件工程 • 上一篇    下一篇

基于注意力机制的Tree2Seq代码注释自动生成

赵乐乐,张丽萍,赵凤荣   

  1. (内蒙古师范大学计算机科学技术学院,内蒙古 呼和浩特 010022)
  • 收稿日期:2021-05-12 修回日期:2021-09-27 接受日期:2023-04-25 出版日期:2023-04-25 发布日期:2023-04-13
  • 基金资助:
    国家自然科学基金(61462071);内蒙古自然科学基金(2018MS06009);内蒙古自治区高等学校科学研究项目(NJZY19026);内蒙古师范大学自主科研项目(29K19ZZYF017);内蒙古师范大学研究生科研创新基金(CXJJS20126)

Automatic code comment generation of Tree2Seq based on attention mechanism

ZHAO Le-le,ZHANG Li-ping,ZHAO Feng-rong   

  1. (College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,China)
  • Received:2021-05-12 Revised:2021-09-27 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

摘要: 代码注释可以帮助开发人员快速理解代码,降低代码维护成本。为了保留代码的结构信息,针对经典的Seq2Seq模型将代码的结构信息压缩为序列,导致结构信息丢失的问题,提出使用Tree-LSTM编码器直接将代码转化成抽象语法树进行编码,使注释生成模型能有效获取代码的结构信息,提升注释生成效果。采用基于注意力机制的Tree2Seq模型实现代码注释生成任务,避免了编码器端将所有输入信息压缩为固定向量,导致部分信息丢失的情况。通过在Java和Python 2种编程语言的数据集上进行实验,使用3种机器翻译常用的自动评测指标进行评估验证,并选取一部分测试数据进行了人工评估。实验结果表明,基于注意力机制的Tree2Seq模型能为解码器提供更全面且丰富的语义结构信息,为后续实验分析和改进提供指导。

关键词: 代码注释, 自动生成, 注意力机制, Tree2Seq

Abstract: Abstract:Code comments can help developers quickly understand code and reduce code maintenance costs. In order to preserve the structure information of the code, the classical Seq2Seq model will compress the structure information of the code into sequences, resulting in the loss of the structure information. A Tree-LSTM encoder is proposed to directly transform the code into an abstract syntax tree for encoding, so that the comments generation model can effectively obtain the structure information of the code and improve the effect of comments generation. The Tree2Seq model based on attention mechanism is adopted to realize the code comments generation task, which avoids the situation that the encoder compresses all input information into a fixed vector, resulting in partial information loss. The experiments are carried out on two programming language datasets, Java and Python. Three automatic evaluation indexes commonly used in machine translation are used for evaluation and verification, and some test data are selected for manual evaluation. Experimental results show that Tree2Seq model based on attention mechanism can provide more comprehensive and rich semantic structure information for decoder, and provide guidance for subsequent experimental analysis and improvement.. 

Key words: code comment, automatic generation, attention mechanism, Tree2Seq