• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2015, Vol. 37 ›› Issue (05): 930-936.

• 论文 • Previous Articles     Next Articles

Research on software defect prediction based on
integrated sampling and ensemble learning 

DAI Xiang1,MAO Yuguang1,2   

  1. (1.College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016;
    2.State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China)
  • Received:2014-04-10 Revised:2014-05-26 Online:2015-05-25 Published:2015-05-25

Abstract:

We study the class-imbalanced problem of software defect prediction and propose an integrated sampling method  for class-imbalanced data classification so as to enhance the classification ability.In order to avoid the blindness of random sampling,we utilize the integrated sampling method to balance datasets:using SMOTE for over-sampling minority class and KMeans clustering for down-sampling majority class.After obtaining a balanced dataset,we utilize multiple single classifiers to ensemble learning. Experimental results show that the software defect prediction algorithm,which combines integrated sampling and ensemble learning,has better classification performance,obtaining a higher true positive rate while significantly reducing the false alarm rate. 

Key words: unbalanced dataset;SMOTE;K-Means;vote;ensemble learning