• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (07): 1324-1330.

Previous Articles    

A source code authorship attribution prediction method based on code coupling degree and PDG features 

CHEN Jie,FENG Xiu-fang,CHEN Yong-le   

  1. (School of Information and Computer,Taiyuan University of Technology,Jinzhong 030600,China)
  • Received:2020-06-17 Revised:2020-07-12 Accepted:2021-07-25 Online:2021-07-25 Published:2021-08-17

Abstract: In order to find the true authors of source codes in the corpus, this paper proposes a method of combining code coupling degree and program dependency graph (PDG) features to identify the authors of different program source codes. Firstly, the parameters, fan-in and fan-out features extracted from the source code are used to calculate the coupling degree of the code. Secondly, control and data dependencies are extracted from the converted program dependency graph, preprocessing technology is applied to convert PDG features into small instances with frequency details, and the frequency inverse document frequency technology is used to amplify the importance of each PDG feature in the source code. Finally, the CPNN model is used to predict the coding style characteristics of programmers, and the attributes of the real authors of the coding style are divided. The results show that the author attribution prediction on the source code data set of 1000 programmers has an accuracy of 95%.


Key words: coupling degree, program dependency graph, authorship attribution