• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于概率的无监督缺陷预测方法

陆正发,徐玲,张小洪,陈林,杨梦宁   

  1. (重庆大学软件学院,重庆 401331)
  • 收稿日期:2015-12-14 修回日期:2016-03-09 出版日期:2017-05-25 发布日期:2017-05-25
  • 基金资助:

    国家自然科学基金(91118005)

An unsupervised defect prediction
method based on probability

LU Zheng-fa,XU Ling,ZHANG Xiao-hong,CHEN Lin,YANG Meng-ning   

  1. (School of Software Engineering,Chongqing University,Chongqing 401331,China)
  • Received:2015-12-14 Revised:2016-03-09 Online:2017-05-25 Published:2017-05-25

摘要:

软件缺陷预测能够提高软件开发和测试的效率,保障软件质量。无监督缺陷预测方法具有不需要标签数据的特点,从而能够快速应用于工程实践中。提出了基于概率的无监督缺陷预测方法—PCLA,将度量元值与阈值的差值映射为概率,使用概率评估类存在缺陷的可能性,然后再通过聚类和标记来完成缺陷预测,以解决现有无监督方法直接根据阈值判断时对阈值比较敏感而引起的信息丢失问题。将PCLA方法应用在NetGen和Relink两组数据集,共7个软件项目上,实验结果表明PCLA方法在查全率、查准率、Fmeasure上相对现有无监督方法分别平均提升4.1%、2.52%、3.14%。

关键词: 软件缺陷预测, 无监督, 软件度量元, 概率, PCLA

Abstract:

Software defect prediction can improve the efficiency of software development and testing to ensure software quality. Unsupervised defect prediction methods can be quickly applied to engineering practice as they do not need labeled data. We propose an unsupervised defect prediction method (probabilistic clustering and labeling, PCLA) based on probability. This method evaluates the probability of the class’s defect by mapping the difference of the metric value and its threshold to probability, and then predicts class by clustering and labeling, which can solve the problem of information loss caused by the existing unsupervised methods that are sensitive to the threshold when they compare metric value with its threshold directly. The PCLA method is applied to seven datasets of NetGen and Relink. Experimental results show that the PCLA method has an average increase of 4.1%, 2.52% and 3.14%, respectively in recall rate, precision and Fmeasure in comparison with the existing unsupervised methods.

Key words: software defect prediction, unsupervised, software metric, probability, PCLA