• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2012, Vol. 34 ›› Issue (10): 128-134.

• 论文 • Previous Articles     Next Articles

A Novel Classification Algorithm for ImbalancedDatasets Based on Hybrid Resampling Strategy

GU Qiong,YUAN Lei, NING Bin,WU Zhao, HUA Li,LI Wenxin   

  1. (School of Mathematics and Computer Science,Hubei University of Arts and Science,Xiangyang 441053,China)
  • Received:2012-04-25 Revised:2012-07-10 Online:2012-10-25 Published:2012-10-25

Abstract:

Imbalanced data is a common problem in classification,this issue occurs when the number of examples of one class is much smaller than the ones of the other classes.Its presence in many realworld applications has attracted a growth of attention from researchers.Classifier learning with datasets that suffer from imbalanced class distributions is a challenging problem in data mining and pattern recognition community.In this paper, we present a novel preprocessing approach that combines unsupervised clustering and supervised learning to handle imbalanced data set and apply this learning approach for training SMO. This proposed algorithm lessen the imbalance ration through the construction of new samples using the improved synthetic minority oversampling technique and then clustering for both classes to delete redundant or noisy samples. Thus, the useful samples are remained,improving the computational efficiency.Experimental results show that the proposed approach can effectively improve the classification accuracy of the minority classes,while maintaining the overall classification performance.

Key words: classification;imbalanced dataset;preprocessing;hybrid resampling;SMOTE;clustering