• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (05): 830-839.

• 软件工程 • 上一篇    下一篇

基于优化随机森林的软件缺陷预测算法研究

唐宇1,代琪2,杨志伟1,杨爱民1,陈丽芳1,3   

  1. (1.华北理工大学理学院,河北 唐山 063210; 2.中国石油大学(北京)自动化系,北京 102249;
    3.河北省数据科学与应用重点实验室,河北 唐山 063210)
  • 收稿日期:2022-07-03 修回日期:2022-10-24 接受日期:2023-05-25 出版日期:2023-05-25 发布日期:2023-05-16
  • 基金资助:
    国家自然科学基金(52074126)

A software defect prediction algorithm based on optimized random forest

TANG Yu1,DAI Qi2,YANG Zhi-wei1,YANG Ai-min1,CHEN Li-fang1,3   

  1. (1.College of Science,North China University of Science and Technology,Tangshan 063210;
    2.Department of Automation,China University of Petroleum (Beijing),Beijing 102249;
    3.Hebei Key Laboratory of Data Science and Application,Tangshan 063210,China)
  • Received:2022-07-03 Revised:2022-10-24 Accepted:2023-05-25 Online:2023-05-25 Published:2023-05-16

摘要: 针对传统随机森林应用于软件缺陷预测领域存在预测精度低、参数难以优化的问题,提出一种分数阶变异麻雀优化随机森林参数的软件缺陷预测算法(FMSSA-RF)。首先,使用分数阶变异麻雀算法(FMSSA)提高麻雀算法全局寻优能力,在4个基准测试函数中,FMSSA具有更高的寻优精度;然后,使用分数阶变异麻雀算法优化随机森林参数;最后,将FMSSA-RF算法应用于软件缺陷预测领域。实验结果表明,在4个项目的10个公开软件缺陷数据集上,FMSSA-RF算法的评价指标明显优于其它3种对比算法的,表明FMSSA-RF算法具有更高的预测精度和更好的稳定性。Friedman ranking和Holm’s post-hoc test的检验结果表明,FMSSA-RF算法具有明显的统计显著性。

关键词: 分数阶变异麻雀算法, 随机森林, 软件缺陷预测

Abstract: The traditional random forest application in the field of software defect prediction has the problems of low prediction accuracy and difficulty in parameter optimization, to address these deficiencies, we propose a new software defect prediction algorithm for optimizing random forest parameters with fractional-order variation sparrow (FMSSA-RF). Firstly, the fractional mutation sparrow algorithm is used to improve the global search capability of conventional FMSSA. The FMSSA algorithm has the advantage of faster convergence speed and higher optimization accuracy in the four benchmark functions. Secondly, the Fractional Mutation Sparrow Algorithm is used to optimize the random forest parameters. Finally, the FMSSA-RF algorithm is performed on the field of software defect prediction. The experimental results show that the evaluation index of the FMSSA-RF algorithm is significantly better than that of the other three comparative algorithms on four groups of ten public software defect data sets, which proves that FMSSA-RF algorithm has higher prediction accuracy and better stability. The results of Friedman ranking and Holm’s post-hoc test also show that the FMSSA-RF algorithm has obvious statistical significance.

Key words: fractional variant sparrow algorithm, random forest, software defect prediction