• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (02): 257-265.

• 软件工程 • 上一篇    下一篇

基于代理辅助多目标萤火虫算法的软件缺陷预测方法研究

曹良林1,2,贲可荣1,张献1   

  1. (1.海军工程大学电子工程学院,湖北 武汉 430033;2.九江学院计算机与大数据科学学院,江西 九江 332005)
  • 收稿日期:2021-10-09 修回日期:2021-12-01 接受日期:2022-02-25 出版日期:2022-02-25 发布日期:2022-02-17
  • 基金资助:
    国家自然科学基金(61763019)

A surrogate-assisted multi-objective firefly algorithm for software defect prediction

CAO Liang-lin1,2,BEN Ke-rong1,ZHANG Xian1   

  1. (1.School of Electronic Engineering,Naval University of Engineering,Wuhan 430033;

    2.School of Computer and Big Data Science,Jiujiang University,Jiujiang 332005,China)


  • Received:2021-10-09 Revised:2021-12-01 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-17

摘要: 针对软件缺陷预测中数据维度的复杂化和类不平衡问题,提出一种基于代理辅助模型的多目标萤火虫算法(SMO-MSFFA)的软件缺陷预测方法。该方法采用了多组策略萤火虫算法(MSFFA),以最小化数据的特征选择比率和最大化模型评测AUC值为多目标目标函数,分别以随机森林(RF)、支持向量机(SVM)和K近邻分类算法(KNN)为分类器构建软件缺陷预测模型。考虑到进化算法自身的迭代特点,嵌入代理模型离线完成部分个体评价函数的计算,以缩短计算耗时。在公开数据集NASA中的PC1、KC1和MC1项目上进行实验验证,与NSGA-II方法相比,在项目PC1、KC1和MC1上模型AUC均值分别提升0.17、降低0.01和提升0.09,平均特征选择比率分别降低0.08,0.17和0.05,平均耗时分别增加131 s,降低了199 s和降低了431 s。实验结果表明,提出的方法在提高模型性能、降低特征选择比率和缩短计算耗时方面具有明显的优势。

关键词: 软件缺陷预测, 机器学习, 多目标, 萤火虫算法, 代理辅助

Abstract: Aiming at the complexity and imbalance of data dimensions in software defect prediction (SDP), a software defect prediction method based on the surrogate-assisted multi-objective firefly algorithm (SMO-MSFFA) is proposed. The proposed method employs the multi-group strategy firefly algorithm (MSFFA) to take minimizing the feature selection ratio and maximizing the the model evaluation AUC value as the two objective functions. Random forest (RF), support vector machine (SVM), and K-nearest neighbor classification algorithm (KNN) are used as the classifiers to construct the software defect prediction models. Considering the computational complexity of the evolutionary algorithm, the embedded surrogate-assisted model completes the calculation of partial individual evaluation function offline to reduce the computational cost. Experiments on PC1, KC1, and MC1 of NASA datasets verify that, compared with NSGA-II, our method increases the model evaluation AUC value by 0.17 on PC1, decreases it by 0.01 on KC1, and increases it by 0.09 on MC1, decreases the average feature selection ratio by 0.08, 0.17, and 0.05 on PC1, KC1, and MC1 respectively, and increases the computational time by 131 seconds, and decreases the time by 199 seconds and 431 seconds on KC1 and MC1 respectively. Experimental results show that the proposed method has obvious advantages in improving the model performance, reducing the feature selection ratio and reducing the computational time. 


Key words: software defect prediction, machine learning, multi-objective, firefly algorithm, surro-gate-assisted model