• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (9): 130-135.

• 论文 • Previous Articles     Next Articles

A Novel Cost Sensitive Learning Algorithm Based on Resampling

GU Qiong,YUAN Lei,NING Bin,XIONG Qijun,HUA Li,LI Wenxin   

  1. (School of Mathematics and  Computer Science,Xiangfan University,Xiangyang 441053,China)
  • Received:2011-05-20 Revised:2011-07-26 Online:2011-09-25 Published:2011-09-25

Abstract:

Most studies on the imbalanced data set classification focus on the discussion of resampling or costsensitive learning systems themselves; however, the fact that the costs of imbalanced class distribution and unequal misclassification errors always occur simultaneously is neglected. We propose a novel cost sensitive learning (CSL) algorithm which combines the  methods of resampling and the CSL techniques together in order to solve the misclassification problem of imbalanced data set. On one hand, the resampling technique allows the balanced data sets by reconstructing both the majority and the minority class. On the other hand, the classification is performed based on the minimal misclassification cost but not the maximal accuracy. Here the misclassification cost for the minority class is much higher than the misclassification cost for the majority class. A costsensitive learning procedure is then conducted for classification. The experimental results show that the proposed method can improve the classification accuracy and decrease the misclassification cost effectively, and the algorithm is superior to the traditional algorithms as for dealing with the imbalanced problem.

Key words: classification;imbalanced dataset;hybrid resampling;cost sensitive learning