融合多结构信息的代码注释生成模型

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (04): 667-675.

融合多结构信息的代码注释生成模型

余天赐,高尚

(江苏科技大学计算机学院，江苏镇江 212100)

收稿日期:2023-04-03 修回日期:2023-10-13 接受日期:2024-04-25 出版日期:2024-04-25 发布日期:2024-04-18
基金资助:
国家自然科学基金（62176107，62376109）

A code summarization generation model fusing multi-structure data

YU Tian-ci,GAO Shang

(School of Computer Science,Jiangsu University of Science and Technology,Zhenjiang 212100,China)

Received:2023-04-03 Revised:2023-10-13 Accepted:2024-04-25 Online:2024-04-25 Published:2024-04-18

摘要/Abstract

摘要： 代码注释可以帮助开发人员理解代码的功能和实现方法。代码注释生成模型可以自动识别代码中的关键信息，并生成相关注释，提高代码的可读性和可维护性。现有的代码注释生成模型通常只使用抽象语法树结构信息来表示代码，导致模型生成注释质量不高。提出一种融合多结构信息的代码注释生成模型，该模型在代码抽象语法树的基础上，增加了数据流图结构信息来表示代码。模型使用Transformer的编码器对抽象语法树序列进行编码，捕获代码全局信息。使用图神经网络对数据流图进行特征提取，提供变量之间的计算依赖关系等信息。然后使用跨模态注意力机制融合抽象语法树和数据流2种特征，经过Transformer的解码器生成相应的注释。实验结果表明，与6种主流模型相比，所提出的模型在Java和Python数据集上的BLEU、METEOR和ROUGE-L指标得分均有提高，生成的注释也具有良好的可读性。

关键词: 代码理解, 代码注释生成, 图神经网络, 多特征融合, 自然语言处理

Abstract: Code summarization can help developers understand the function and implementation of the code. The code summarization generation model can automatically identify the key information in the code and generate relevant summarization to improve the readability and maintainability of the code. Existing code summarization generation models usually only use abstract syntax tree structure information to represent code, resulting in low-quality model-generated summarization. Aiming at this problem, this paper proposes a code summarization generation model that integrates multi-structure data. Firstly, the model adds data flow graph structure information to represent code on the basis of abstract syntax tree. Secondly, in order to capture the global information of the code, the model uses Transformer's encoder to encode the abstract syntax tree sequence. In addition, the model uses the graph neural network to extract features from the data flow graph and provide information such as the computational depen- dencies between variables. Finally, the model uses the cross-modal attention mechanism to fuse the two features of the abstract syntax tree and the data flow and generate corresponding summarization through the Transformer decoder. The experimental results show that, compared with the six mainstream models, the model improves the scores of BLEU,METEOR and ROUGE-L on the Java and Python datasets, and the generated summarization is also very readable.

Key words: code understanding, code summarization generation, graph neural network, multi-feature fusion, natural language processing

余天赐, 高尚. 融合多结构信息的代码注释生成模型[J]. 计算机工程与科学, 2024, 46(04): 667-675.

YU Tian-ci, GAO Shang. A code summarization generation model fusing multi-structure data[J]. Computer Engineering & Science, 2024, 46(04): 667-675.

编辑推荐

Metrics

阅读次数

全文

586

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	0	0	0	586

来源	本网站	其他网站

次数	495	91
比例	84%	16%

摘要

202

最新录用	在线预览	正式出版

0	0	202

	来源	本网站

	次数	202
	比例	100%

[1]	陈子雄, 陈旭, 景永俊, 宋吉飞. 基于图神经网络的源代码漏洞检测研究综述[J]. 计算机工程与科学, 2024, 46(10): 1775-1792.
[2]	陈昌奉, 赵宏州, 周恺卿. 基于图神经网络的代码抄袭检测方法[J]. 计算机工程与科学, 2024, 46(10): 1815-1824.
[3]	张悦, 张磊, 刘佰龙, 梁志贞, 张雪飞. 基于时空Transformer的多空间尺度交通预测模型[J]. 计算机工程与科学, 2024, 46(10): 1852-1863.
[4]	袁佳伟, 赵进. 基于图神经网络的OMCI模型相似性计算[J]. 计算机工程与科学, 2024, 46(09): 1576-1586.
[5]	吴斯琦, 赵清华, 于雨晨. 基于元学习的图神经网络冷启动推荐[J]. 计算机工程与科学, 2024, 46(09): 1675-1684.
[6]	丁建平, 李卫军, 刘雪洋, 陈旭. 命名实体识别研究综述[J]. 计算机工程与科学, 2024, 46(07): 1296-1310.
[7]	王谢中, 陈旭, 景永俊, 王叔洋. 基于异构图神经网络的半监督网站主题分类[J]. 计算机工程与科学, 2024, 46(04): 635-646.
[8]	李清风, 金柳, 马慧芳, 张若一. 双视图对比学习引导的多行为推荐方法[J]. 计算机工程与科学, 2024, 46(04): 707-715.
[9]	马雪, 何星星, 兰咏琪, 李莹芳. 一阶逻辑中基于treelet图神经网络的前提选择[J]. 计算机工程与科学, 2024, 46(02): 374-380.
[10]	孙庆骁, 刘轶, 杨海龙, 王一晴, 贾婕, 栾钟治, 钱德沛. GNNSched：面向GPU的图神经网络推理任务调度框架[J]. 计算机工程与科学, 2024, 46(01): 1-11.
[11]	周菊香, 周明涛, 甘健侯, 徐坚. 多阶段时序和语义信息增强的问题生成模型[J]. 计算机工程与科学, 2023, 45(10): 1847-1857.
[12]	杨春霞, 桂强, 马文文, 徐奔, . 融合图游走信息的图注意力网络方面级情感分析[J]. 计算机工程与科学, 2023, 45(10): 1858-1865.
[13]	曹健, 陈怡梅, 李海生, 蔡强, . 基于图神经网络的行人轨迹预测研究综述[J]. 计算机工程与科学, 2023, 45(06): 1040-1053.
[14]	王扬, 陈智斌. 一种求解CVRP的动态图转换模型[J]. 计算机工程与科学, 2023, 45(05): 859-868.
[15]	罗可劲, 刘广聪, 杨文浩. 基于多任务学习的图神经网络推荐模型研究[J]. 计算机工程与科学, 2023, 45(04): 726-733.