基于注意力机制的Tree2Seq代码注释自动生成

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (04): 638-645.

基于注意力机制的Tree2Seq代码注释自动生成

赵乐乐，张丽萍，赵凤荣

(内蒙古师范大学计算机科学技术学院,内蒙古呼和浩特 010022)

收稿日期:2021-05-12 修回日期:2021-09-27 接受日期:2023-04-25 出版日期:2023-04-25 发布日期:2023-04-13
基金资助:
国家自然科学基金（61462071）；内蒙古自然科学基金（2018MS06009）；内蒙古自治区高等学校科学研究项目（NJZY19026）;内蒙古师范大学自主科研项目(29K19ZZYF017)；内蒙古师范大学研究生科研创新基金（CXJJS20126）

Automatic code comment generation of Tree2Seq based on attention mechanism

ZHAO Le-le,ZHANG Li-ping,ZHAO Feng-rong

(College of Computer Science and Technology,Inner Mongolia Normal University,Hohhot 010022,China)

Received:2021-05-12 Revised:2021-09-27 Accepted:2023-04-25 Online:2023-04-25 Published:2023-04-13

摘要/Abstract

摘要： 代码注释可以帮助开发人员快速理解代码，降低代码维护成本。为了保留代码的结构信息，针对经典的Seq2Seq模型将代码的结构信息压缩为序列，导致结构信息丢失的问题，提出使用Tree-LSTM编码器直接将代码转化成抽象语法树进行编码，使注释生成模型能有效获取代码的结构信息，提升注释生成效果。采用基于注意力机制的Tree2Seq模型实现代码注释生成任务，避免了编码器端将所有输入信息压缩为固定向量，导致部分信息丢失的情况。通过在Java和Python 2种编程语言的数据集上进行实验，使用3种机器翻译常用的自动评测指标进行评估验证，并选取一部分测试数据进行了人工评估。实验结果表明，基于注意力机制的Tree2Seq模型能为解码器提供更全面且丰富的语义结构信息，为后续实验分析和改进提供指导。

关键词: 代码注释, 自动生成, 注意力机制, Tree2Seq

Abstract: Abstract:Code comments can help developers quickly understand code and reduce code maintenance costs. In order to preserve the structure information of the code, the classical Seq2Seq model will compress the structure information of the code into sequences, resulting in the loss of the structure information. A Tree-LSTM encoder is proposed to directly transform the code into an abstract syntax tree for encoding, so that the comments generation model can effectively obtain the structure information of the code and improve the effect of comments generation. The Tree2Seq model based on attention mechanism is adopted to realize the code comments generation task, which avoids the situation that the encoder compresses all input information into a fixed vector, resulting in partial information loss. The experiments are carried out on two programming language datasets, Java and Python. Three automatic evaluation indexes commonly used in machine translation are used for evaluation and verification, and some test data are selected for manual evaluation. Experimental results show that Tree2Seq model based on attention mechanism can provide more comprehensive and rich semantic structure information for decoder, and provide guidance for subsequent experimental analysis and improvement..

Key words: code comment, automatic generation, attention mechanism, Tree2Seq

赵乐乐, 张丽萍, 赵凤荣. 基于注意力机制的Tree2Seq代码注释自动生成[J]. 计算机工程与科学, 2023, 45(04): 638-645.

ZHAO Le-le, ZHANG Li-ping, ZHAO Feng-rong. Automatic code comment generation of Tree2Seq based on attention mechanism[J]. Computer Engineering & Science, 2023, 45(04): 638-645.

编辑推荐

Metrics

阅读次数

全文

377

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	377

来源	本网站	其他网站

次数	281	96
比例	75%	25%

摘要

208

最新录用	在线预览	正式出版

0	0	208

[1]	徐超, 阮荣耀, 陈勇, . 一种基于区块链的医疗数据审计方法[J]. 计算机工程与科学, 2025, 47(01): 95-106.
[2]	陈兆波, 张琳, 马晓轩. 改进注意力混合自动编码器视频异常检测研究[J]. 计算机工程与科学, 2025, 47(01): 130-139.
[3]	付燕, 杨旭, 叶鸥. 基于CNN和Transformer特征融合的烟雾识别方法[J]. 计算机工程与科学, 2024, 46(11): 2045-2052.
[4]	余佳妮, 胡朝霞, 蒋从锋. 一种基于多特征的日志事件异常检测方法研究[J]. 计算机工程与科学, 2024, 46(09): 1587-1597.
[5]	刘国岐, 何廷年, 荣艺煊, 李卓然. 基于用户轨迹和好友关系的兴趣点推荐[J]. 计算机工程与科学, 2024, 46(09): 1693-1701.
[6]	刘晓华, 徐茹枝, 杨成月. 一种基于多特征融合嵌入的中文命名实体识别模型研究[J]. 计算机工程与科学, 2024, 46(08): 1473-1481.
[7]	张永智, 何可人, 戈珏. 改进YOLOv7网络在低空遥感图像目标检测中的应用[J]. 计算机工程与科学, 2024, 46(07): 1269-1277.
[8]	王泽宇, 徐慧英, 朱信忠, 李琛, 刘子洋, 王子奕. 基于YOLOv8改进的密集行人检测算法：MER-YOLO[J]. 计算机工程与科学, 2024, 46(06): 1050-1062.
[9]	邓翔宇, 裴浩媛, 盛迎. 基于网络融合的改进MobileViT人脸表情识别[J]. 计算机工程与科学, 2024, 46(06): 1072-1080.
[10]	张玉莹, 朱广丽, 谈光璞, . 基于情感增强和语义依存的金融隐式情感分析模型[J]. 计算机工程与科学, 2024, 46(06): 1112-1120.
[11]	尹春勇, 赵峰. 基于双层注意力和深度自编码器的时间序列异常检测模型[J]. 计算机工程与科学, 2024, 46(05): 826-835.
[12]	赵金源, 贾迪. 改进YOLOv5的多人姿态估计修正算法[J]. 计算机工程与科学, 2024, 46(05): 852-860.
[13]	马长林, 孙状. 基于实体知识的远程监督关系抽取[J]. 计算机工程与科学, 2024, 46(05): 945-950.
[14]	曹浩东, 汪海涛, 贺建峰. 融合序列局部信息的日期感知序列推荐算法[J]. 计算机工程与科学, 2024, 46(04): 734-742.
[15]	要媛媛, 刘宇航, 程雨菁, 彭梦晓, 郑文, . 融合多注意力机制的自监督小样本医学图像分割[J]. 计算机工程与科学, 2024, 46(03): 479-487.