• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 论文 • 上一篇    下一篇

基于代价敏感神经网络的警告分类研究

潘志辉,杨丹,张小洪,徐玲   

  1. (重庆大学软件学院,重庆 401331)
  • 收稿日期:2015-12-29 修回日期:2016-03-17 出版日期:2017-06-25 发布日期:2017-06-25
  • 基金资助:

    国家自然科学基金(91118005)

Alert classification based on cost
sensitive neural networks

PAN Zhi-hui,YANG Dan,ZHANG Xiao-hong,XU Ling   

  1. (School of Software Engineering,Chongqing University,Chongqing 401331,China)
     
  • Received:2015-12-29 Revised:2016-03-17 Online:2017-06-25 Published:2017-06-25

摘要:

静态分析工具可以帮助开发人员在项目编码初期定位可能存在缺陷的代码。然而有研究表明,此类工具往往会报告大量的警告,且其中大部分为误报警告。为了增强静态分析工具的可用性,研究者们通常采用统计和机器学习方法将警告分类为有效警告和误报警告。然而,现有警告分类方法并未考虑大量误报警告造成警告数据类不平衡问题,以及误分类代价不等的问题。鉴于此,分别将BP神经网络和基于过采样、阈值操作、欠采样方法的代价敏感神经网络应用到有效警告的分类中。实验结果对比发现,相比BP神经网络,基于代价敏感神经网络方法在有效警告查全率方面平均提高了44.07%,且当有效警告被误分类的代价高于一定值时,代价敏感分类方法能得到更低的分类代价。

关键词: 有效警告, 误报警告, 代价敏感, 类不平衡, 神经网络

Abstract:

Static analysis tools can help developers locate potential code errors in the early phase of development. However, studies show that such tools always report a large number of alerts, and most of them are meaningless false ones. To enhance the availability of static analysis tools, researchers divide alerts to actionable and unactionable alerts using statistics and machine learning techniques. These classification techniques do not consider the class imbalance problem caused by false positives and the unequal cost of different misclassifications. Aiming at these problems, we apply the BP neural networks and cost sensitive neural networks based on over sampling, threshold moving and under sampling techniques to classify alerts respectively. Experimental results show that, compared with BP neural networks, the cost sensitive neural networks techniques can on average increase actionable alert recall rate by 44.07%. And when the cost of misclassification of an actionable alert is above a certain value, cost sensitive techniques can have a lower classification cost.

Key words: actionable alert, unactionable alert, cost sensitive, class imbalance, neural networks