J4 ›› 2012, Vol. 34 ›› Issue (10): 128-134.
• 论文 • Previous Articles Next Articles
GU Qiong,YUAN Lei, NING Bin,WU Zhao, HUA Li,LI Wenxin
Received:
Revised:
Online:
Published:
Abstract:
Imbalanced data is a common problem in classification,this issue occurs when the number of examples of one class is much smaller than the ones of the other classes.Its presence in many realworld applications has attracted a growth of attention from researchers.Classifier learning with datasets that suffer from imbalanced class distributions is a challenging problem in data mining and pattern recognition community.In this paper, we present a novel preprocessing approach that combines unsupervised clustering and supervised learning to handle imbalanced data set and apply this learning approach for training SMO. This proposed algorithm lessen the imbalance ration through the construction of new samples using the improved synthetic minority oversampling technique and then clustering for both classes to delete redundant or noisy samples. Thus, the useful samples are remained,improving the computational efficiency.Experimental results show that the proposed approach can effectively improve the classification accuracy of the minority classes,while maintaining the overall classification performance.
Key words: classification;imbalanced dataset;preprocessing;hybrid resampling;SMOTE;clustering
GU Qiong,YUAN Lei, NING Bin,WU Zhao, HUA Li,LI Wenxin. A Novel Classification Algorithm for ImbalancedDatasets Based on Hybrid Resampling Strategy[J]. J4, 2012, 34(10): 128-134.
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://joces.nudt.edu.cn/EN/
http://joces.nudt.edu.cn/EN/Y2012/V34/I10/128