• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (07): 1324-1330.

• 人工智能与数据挖掘 • 上一篇    

基于耦合度和PDG混合特征的源代码作者归属预测

陈杰,冯秀芳,陈永乐   

  1. (太原理工大学信息与计算机学院,山西 晋中 030600)
  • 收稿日期:2020-06-17 修回日期:2020-07-12 接受日期:2021-07-25 出版日期:2021-07-25 发布日期:2021-08-17
  • 基金资助:
    山西省重点研发计划(201903D121121);虚拟现实技术与系统国家重点实验室(北京航空航天大学)开放基金(VRLAB2019A05)

A source code authorship attribution prediction method based on code coupling degree and PDG features 

CHEN Jie,FENG Xiu-fang,CHEN Yong-le   

  1. (School of Information and Computer,Taiyuan University of Technology,Jinzhong 030600,China)
  • Received:2020-06-17 Revised:2020-07-12 Accepted:2021-07-25 Online:2021-07-25 Published:2021-08-17

摘要: 为了在语料库中找出源代码的真实作者,提出了一种代码耦合度与程序依赖图特征结合的神经网络模型CPNN来识别源代码作者。首先,使用从源代码中提取的参数、扇入和扇出等特征计算代码的耦合度。其次,从转换的程序依赖图中提取控制和数据依赖项,应用预处理技术将PDG特征转换为具有频率细节的小实例,并且利用逆文档频率技术放大源代码中每个PDG特性的重要性。最后,利用CPNN模型预测程序员的编码风格特征,并对编码风格的真正作者进行属性划分。在1 000名程序员的源代码数据集上进行作者归属预测,得到了95%的准确率。


关键词: 耦合度, 程序依赖图, 作者归属

Abstract: In order to find the true authors of source codes in the corpus, this paper proposes a method of combining code coupling degree and program dependency graph (PDG) features to identify the authors of different program source codes. Firstly, the parameters, fan-in and fan-out features extracted from the source code are used to calculate the coupling degree of the code. Secondly, control and data dependencies are extracted from the converted program dependency graph, preprocessing technology is applied to convert PDG features into small instances with frequency details, and the frequency inverse document frequency technology is used to amplify the importance of each PDG feature in the source code. Finally, the CPNN model is used to predict the coding style characteristics of programmers, and the attributes of the real authors of the coding style are divided. The results show that the author attribution prediction on the source code data set of 1000 programmers has an accuracy of 95%.


Key words: coupling degree, program dependency graph, authorship attribution