• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (04): 670-680.

Previous Articles     Next Articles

Duplicate bug report detection by combining distributed representations of documents#br#

ZENG Jie,BEN Ke-rong,ZHANG Xian,XU Yong-shi#br#

#br#
  

  1. (College of Electronic Engineering,Naval University of Engineering,Wuhan 430033,China)
  • Received:2020-01-06 Revised:2020-06-18 Accepted:2021-04-25 Online:2021-04-25 Published:2021-04-21

Abstract: Duplicate bug report detection can avoid the repeated assignment and repair processes for multiple bug reports that describe the same bug, and thus greatly reduce the cost of software main- tenance. To improve the accuracy of detection, this paper proposes a duplicate bug report detection method by combining distributed representations of documents. Firstly, the Doc2Vec model is trained based on a large-scale defect report database, the distributed representations of bug reports are extracted, and the variable-sized bug reports are encoded into fixed-sized dense vectors. Secondly, the similarities between different bug reports are calculated by comparing their dense vectors, it is as a new feature and combined with traditional features commonly used in the process of duplicate bug report detection, and machine learning algorithm is used to train the binary classification model. Experimental results on public duplicate bug report datasets from Bugzilla show that, compared with the state of the art method D_TS, our method improves the F1 value by 2% on average, which indicates the effectiveness of the new feature. 


Key words: duplicate bug report, distributed representations of documents, Doc2Vec model, machine learning algorithm

CLC Number: