• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2021, Vol. 43 ›› Issue (03): 525-533.

Previous Articles     Next Articles

A cost-sensitive imbalanced data classification algorithm based on KPCA-Stacking

CAO Ting-ting,ZHANG Zhong-lin   

  1. (College of Electronic and Information Engineering,Lanzhou Jiaotong University,Lanzhou 730070,China)
  • Received:2019-12-31 Revised:2020-04-27 Accepted:2021-03-25 Online:2021-03-25 Published:2021-03-29

Abstract: Cost-sensitive learning is an important strategy to solve the problem of imbalanced data classification. The non-linearity of data characteristics also brings some difficulties to classification. In view of this problem, by combining cost-sensitive learning with kernel principal component analysis (KPCA), this paper proposes a cost-sensitive Stacking integration algorithm called KPCA-Stacking. 
Firstly, the original data set is over-sampled by the adaptive synthetic sampling method (ADASYN) and KPCA dimensionality reduction is performed; Secondly, KNN, LDA, SVM, and RF are converted into cost-sensitive algorithms according to the Bayesian risk minimization principle as the primary learner in the Stacking integrated learning framework, and logistic regression is used as the meta-learner. Compa- rative experiments on 10 algorithms such as J48 decision tree in 5 public datasets show that the cost- sensitive KPCA-Stacking algorithm improves the recognition rate of a few classes to a certain extent, and is better than the overall classification performance of a single model.


Key words: imbalanced data, cost-sensitive, KPCA, Stacking, ADASYN oversampling, classification