• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 软件工程 • 上一篇    下一篇

基于深度自编码网络的软件缺陷预测方法

周末,徐玲,杨梦宁,廖胜平,鄢萌   

  1. (重庆大学大数据与软件学院,重庆 401331)
  • 收稿日期:2017-09-07 修回日期:2018-03-20 出版日期:2018-10-25 发布日期:2018-10-25

Software defect prediction based on
deep autoencoder networks

ZHOU Mo,XU Ling,YANG Mengning,LIAO Shengping,YAN Meng   

  1. (School of Big Data & Software Engineering,Chongqing University,Chongqing 401331,China)
  • Received:2017-09-07 Revised:2018-03-20 Online:2018-10-25 Published:2018-10-25

摘要:

软件缺陷预测是提升软件质量的有效方法,而软件缺陷预测方法的预测效果与数据集自身的特点有着密切的相关性。针对软件缺陷预测中数据集特征信息冗余、维度过大的问题,结合深度学习对数据特征强大的学习能力,提出了一种基于深度自编码网络的软件缺陷预测方法。该方法首先使用一种基于无监督学习的采样方法对6个开源项目数据集进行采样,解决了数据集中类不平衡问题;然后训练出一个深度自编码网络模型。该模型能对数据集进行特征降维,模型的最后使用了三种分类器进行连接,该模型使用降维后的训练集训练分类器,最后用测试集进行预测。实验结果表明,该方法在维数较大、特征信息冗余的数据集上的预测性能要优于基准的软件缺陷预测模型和基于现有的特征提取方法的软件缺陷预测模型,并且适用于不同分类算法。

关键词: 软件缺陷预测, 特征降维, 深度自编码网络, 类不平衡

Abstract:

Software defect prediction is an effective way for improving the quality of software, and the effect of software defect prediction is closely related to data sets’own characteristics. In regard of feature information redundancy and large dimension of data sets, combining with the powerful learning feature ability of deep learning, we propose a software defect prediction method based on deep autoencoder networks. This method firstly uses an unsupervised learning sampling method to do  sampling for 6 open source projects data sets to solve class imbalance problem of datasets. We then build a deep autoencoder network model through training, which can reduce the dimension of data sets. The model uses three classifiers for connection and employs the training sets with reduced dimension to train the classifiers. Finally, we use the test sets to do prediction. Experimental results show that the proposed method outperforms the basic software defect prediction model and the software defect prediction model based on existing feature extraction methods under the circumstance of the data sets with large dimension and redundant feature information. Besides, it is adaptive to different classifiers.
 

Key words: software defect prediction, feature dimension reduction, deep autoencoder network, class imbalance