• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2025, Vol. 47 ›› Issue (5): 940-950.

• Artificial Intelligence and Data Mining • Previous Articles    

Heterogeneous ensemble learning with feature subspace augmentation for imbalanced data

CHEN Lifang1,2,BAI Yun1,SHI Yonghui1,DAI Qi1#br#   

  1. (1.College of Science,North China University of Science and Technology,Tangshan 063210;
    2.Hebei Key Laboratory of Data Science and Application,Tangshan 063210,China)
  • Received:2024-01-15 Revised:2024-05-16 Online:2025-05-25 Published:2025-05-27

Abstract: For imbalanced data, traditional classifiers tend to identify the majority class at the expense of accuracy for the minority class, leading to degraded overall algorithm performance. To address this issue, a heterogeneous ensemble learning algorithm with feature subspace augmentation (HEL-FSA) for imbalanced data is proposed. Firstly, using the XGBoost algorithm to learn the importance of features and selects important features to form a feature subspace for the dataset. Secondly, the SMOTE algorithm is used to generate new samples within this feature subspace, obtaining more balanced training data. Thirdly, five classifiers, named Logistic Regression, Decision Tree, Multi-Layer Perceptron, Support Vector Machine, and XGBoost  are employed as base models, and the heterogeneous base models are fused using the if_any algorithm. Experimental results on nine imbalanced datasets verify the feasibility of the proposed algorithm. Additionally, when applied to cervical cancer risk prediction, the proposed algorithm enhances the ability to understand and predict cervical cancer risk.

Key words: imbalanced data, feature selection, ensemble learning, synthetic minority over-sampling technique