• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    

一种基于集成的不均衡数据流分类算法

袁泉1,2,郭江帆1,赵学华1   

  1. (1.重庆邮电大学通信与信息工程学院通信新技术应用研究中心,重庆 400065;
    2.重庆信科设计有限公司,重庆 401121)
  • 收稿日期:2018-12-12 修回日期:2019-03-12 出版日期:2019-08-25 发布日期:2019-08-25

An imbalanced data stream classification
algorithm based on ensemble learning

YUAN Quan1,2,GUO Jiang-fan1,ZHAO Xue-hua1   

  1. (1.Research Center of New Telecommunication Technology Applications,
    School of Telecommunications and Information Engineering,
    Chongqing University of Posts and Telecommunications,Chongqing 400065;
    2.Chongqing Information Technology Designing Company Limited,Chongqing 401121,China)

     
  • Received:2018-12-12 Revised:2019-03-12 Online:2019-08-25 Published:2019-08-25

摘要:

目前数据流分类算法大多是基于类分布这一理想状态,然而在真实数据流环境中数据分布往往是不均衡的,并且数据流中往往伴随着概念漂移。针对数据流中的不均衡问题和概念漂移问题,提出了一种新的基于集成学习的不均衡数据流分类算法。首先为了解决数据流的不均衡问题,在训练模型前加入混合采样方法平衡数据集,然后采用基分类器加权和淘汰策略处理概念漂移问题,从而提高分类器的分类性能。最后与经典数据流分类算法在人工数据集和真实数据集上进行对比实验,实验结果表明,本文提出的算法在含有概念漂移和不均衡的数据流环境中,其整体分类性能优于其他算法的。

关键词: 数据流, 概念漂移, 集成学习, 不均衡

Abstract:

At present, most data stream classification algorithms assume that the class distribution is basically balanced. However, the data distribution is often unbalanced and accompanied by conceptual drift in real data stream environments. Aiming at the problem of unbalanced data distribution and concept drift, we propose an unbalanced data stream classification algorithm based on ensemble learning. Firstly, in order to solve the problem of unbalanced data flows, a mixed sampling method is added to balance the data set before model training. And then the concept drift problem is solved with base classifier weight and elimination strategy. Finally, comparison experiments among data stream classification algorithms are carried out on artificial and real data sets. Experimental results show that the proposed algorithm has better overall classification performance than other algorithms in data stream environments with concept drift and imbalance.
 

Key words: data stream, concept drift, ensemble learning, unbalance