• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2014, Vol. 36 ›› Issue (05): 977-985.

• 论文 • 上一篇    下一篇

基于特征漂移的数据流集成分类方法

张育培,刘树慧   

  1. (郑州大学信息工程学院,河南 郑州 450052)
  • 收稿日期:2012-12-17 修回日期:2013-05-18 出版日期:2014-05-25 发布日期:2014-05-25

Ensemble classification
based on feature drifting in data streams          

ZHANG Yupei,LIU Shuhui   

  1. (School of Information Engineering,Zhengzhou University,Zhengzhou 450052,China)
  • Received:2012-12-17 Revised:2013-05-18 Online:2014-05-25 Published:2014-05-25

摘要:

为构建更加有效的隐含概念漂移数据流分类器,依据不同数据特征对分类关键程度不同的理论,提出基于特征漂移的数据流集成分类方法(ECFD)。首先,给出了特征漂移的概念及其与概念漂移的关系;然后,利用互信息理论提出一种适合数据流的无监督特征选择技术(UFF),从而析取关键特征子集以检测特征漂移;最后,选用具有概念漂移处理能力的基础分类算法,在关键特征子集上建立异构集成分类器,该方法展示了一种隐含概念漂移高维数据流分类的新思路。大量实验结果显示,尤其在高维数据流中,该方法在精度、运行速度及可扩展性方面都有较好的表现。

关键词: 特征选择, 特征漂移, 概念漂移, 数据流, 互信息, 集成分类器

Abstract:

In order to construct an effective classifier for data streams with concept drifting,according to the theory that different data feature has different critical degree for classification,a method of Ensemble Classifier for Feature Drifting in data streams (ECFD) is proposed. Firstly,the definite of feature drifting and the relationship between feature drifting and concept drifting is given.Secondly,mutual information theory is used to propose an Unsupervised Feature Filter (UFF) technique,so that critical feature subsets are extracted to detect feature drifting.Finally, the basic classified algorithms with the capability of handling concept drifting is chosen to construct heterogeneous ensemble classifier on the basis of critical feature subsets. This method exhibits a new idea of way to highdimensional data streams with hidden concept drifting.Experimental results show that the method has strong appearance in accuracy, speed and scalability, especially for highdimensional data streams.
          

Key words: feature selection;feature drifting;concept drifting;data stream;mutual information;ensemble classifier