• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2011, Vol. 33 ›› Issue (12): 99-105.

• 论文 • 上一篇    下一篇

一种不平衡噪声数据流集成分类模型

欧阳震诤1,陶孜谨1,蔡建宇2,吴泉源1   

  1. (1.国防科学技术大学计算机学院,湖南 长沙 410073;2.武汉国防信息学院,湖北 武汉 430010)
  • 收稿日期:2009-06-02 修回日期:2009-09-27 出版日期:2011-12-24 发布日期:2011-12-25

An Ensemble Classifier for Mining Imbalanced Data Streams with Noise

OUYANG Zhenzheng1,TAO Zijin1,CAI Jianyu2,WU Quanyuan1   

  1. (1.School of Computer Science,National University of Defense Technology,Changsha 410073;2.National Defense Information Academy,Wuhan 430010,China)
  • Received:2009-06-02 Revised:2009-09-27 Online:2011-12-24 Published:2011-12-25

摘要:

针对不平衡噪声数据流的分类问题,本文利用基于平均概率的集成分类器AP与抽样技术,提出了一种处理不平衡噪声数据流的集成分类器(IMDAP)模型。实验结果表明,该集成分类器更能适应存在概念漂移与噪声的不平衡数据流挖掘分类,其整体分类性能优于AP集成分类器模型,能明显提升少数类的分类精度,并且具有与AP相近的时间复杂度。

关键词: 不平衡数据流, 概念漂移, 噪声, 集成分类器

Abstract:

Many real world data streams mining applications involve learning from imbalanced data streams, where such applications expect to have a higher predictive accuracy over the minority class, however most classification models assume relatively balanced data streams, and they cannot handle imbalanced distribution. In this paper, we propose a novel ensemble classifier framework (IMDAP) for mining conceptdrifting and noisy data streams with imbalanced distribution by using an averaged probability ensemble framework and sampling technique. Our empirical study shows that the IMDAP is superior and have improves both the capability of the classifier and the accuracy in performing classification over the minority class.

Key words: imbalanced data streams;concept drift;noise;ensemble classifier