• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2022, Vol. 44 ›› Issue (05): 788-799.

• High Performance Computing • Previous Articles     Next Articles

A Boosting classification algorithm for imbalanced drift data stream based on Hellinger distance

ZHANG Xi-long,HAN Meng,CHEN Zhi-qiang,WU Hong-xin,LI Mu-hang   

  1. (School of Computer Science and Engineering,North Minzu University,Yinchuan 750021,China)
  • Received:2021-11-09 Revised:2022-01-15 Accepted:2022-05-25 Online:2022-05-25 Published:2022-05-24

Abstract: Imbalanced data stream will seriously affect the classification performance of the algorithm and the emer-gence of concept drift is a difficult problem in the field of stream data mining. In order to improve the classification performance of such problem, a new Boosting Classification Algorithm for imbalanced drifted data stream based on Hellinger Distance (BCA-HD) is proposed. The algorithm innovatively uses the weighted combination of instance level and classifier level to dynamically update the classifier to adapt to the occurrence of concept drift. The integrated algorithm SMOTEBoost is used as the base classifier at the bottom layer, and the classifier uses resampling technology to deal with the imbalanced data. Finally, the proposed algorithm is compared with 9 different algorithms on 16 abrupt and gradual datasets. The results show that average value and average rankings of G-mean and AUC are both ranked first. Experiments show that the algorithm can better adapt to the simultaneous occurrence of concept drift and imbalance, which helps to improve the classification performance.

Key words: data stream, imbalanced data, concept drift, Boosting, Hellinger distance