• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles    

An imbalanced data stream classification
algorithm based on ensemble learning

YUAN Quan1,2,GUO Jiang-fan1,ZHAO Xue-hua1   

  1. (1.Research Center of New Telecommunication Technology Applications,
    School of Telecommunications and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    2.Chongqing Information Technology Designing Company Limited,Chongqing 401121,China)

     
  • Received:2018-12-12 Revised:2019-03-12 Online:2019-08-25 Published:2019-08-25

Abstract:

At present, most data stream classification algorithms assume that the class distribution is basically balanced. However, the data distribution is often unbalanced and accompanied by conceptual drift in real data stream environments. Aiming at the problem of unbalanced data distribution and concept drift, we propose an unbalanced data stream classification algorithm based on ensemble learning. Firstly, in order to solve the problem of unbalanced data flows, a mixed sampling method is added to balance the data set before model training. And then the concept drift problem is solved with base classifier weight and elimination strategy. Finally, comparison experiments among data stream classification algorithms are carried out on artificial and real data sets. Experimental results show that the proposed algorithm has better overall classification performance than other algorithms in data stream environments with concept drift and imbalance.
 

Key words: data stream, concept drift, ensemble learning, unbalance