• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (10): 1952-1960.

• 论文 • 上一篇    下一篇

一种基于多变量Logistic模型的缺陷定位方法

鞠小林1, 2,姜淑娟1,陈翔2,曹鹤玲1,王兴亚1   

  1. (1.中国矿业大学计算机科学与技术学院,江苏 徐州 221116;2.南通大学计算机科学与技术学院,江苏 南通 226019)
  • 收稿日期:2014-06-13 修回日期:2014-08-15 出版日期:2014-10-25 发布日期:2014-10-25
  • 基金资助:

    国家自然科学基金资助项目(61202006, 61340037);中央高校基本科研业务费专项资金资助项目(2013QNB17);江苏省高校自然科学研究资助项目(12KJB520014);江苏省研究生培养创新工程资助项目(CXZZ12_0935)

A fault localization approach using
multivariate Logistic regression model   

JU Xiaolin1, 2,JIANG Shujuan1,CHEN Xiang2,CAO Heling1,WANG Xingya1   

  1. (1.School of Computer Science and Technology,China University of Mining and Technology,Xuzhou 221116;
    2.School of Computer Science and Technology,Nantong University,Nantong 226019,China)
  • Received:2014-06-13 Revised:2014-08-15 Online:2014-10-25 Published:2014-10-25

摘要:

缺陷定位是软件开发过程的重要环节。充分利用程序的结构特征和行为特征有助于提高缺陷定位效率。提出一种基于多变量Logistic回归分析的缺陷定位框架, 用于软件演化时对新版本程序进行类方法级别的缺陷定位。首先设计一组度量结构特征和行为特征的指标, 通过静态分析和测试程序搜集并构建旧版本程序的特征数据集, 同时从缺陷跟踪系统获取旧版本缺陷信息;其次, 基于所得特征数据集和缺陷信息, 应用单变量分析筛选出度量指标中与缺陷显著相关的指标, 随后用选中的显著指标展开多变量分析, 训练多变量Logistic模型;最后, 基于选出的显著指标搜集并构建新版本程序的特征数据集, 运用得到的Logistic模型预测每个类方法的出错概率, 进而按出错概率降序检查类方法以定位错误。基于一组开源程序进行缺陷定位实证研究,结果表明, 多变量Logistic模型可以提高缺陷定位的效率。关键词:

关键词: 缺陷定位, 多变量Logistic分析, 软件度量, 软件测试

Abstract:

Fault localization plays an important role in software development. Combining both construction features and behavior characteristics of program can benefit fault locating. A framework based on multivariate logistic regress model for locating fault in evolving software is proposed. Firstly, the feature data set is constructed by statically analyzing and tracing the program that runs with a set of designed metrics of program construction features and behavior characteristics. Meanwhile, the fault information of old version is obtained from the bug tracking system. Secondly, a univariate analysis is performed to select the metrics that are significantly related to fault, and then we train the multivariate Logistic model on the selected metrics with the constructed feature data set and the tracked fault information. Finally, based on the trained Logistic model, we conduct the multivariate logistic analysis on the feature data set of a new version of evolved software, and predict the faulty class methods. We also conduct an empirical study on a set of benchmarks. The results indicate that the multivariate Logistic model can improve the effectiveness of fault localization.

Key words: fault localization;multivariate logistic analysis;software measurement;software testing