• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于自适应随机森林的数据流分类算法

张馨予,安建成,曹锐   

  1. (太原理工大学软件学院,山西 晋中 030600)
     
  • 收稿日期:2019-08-04 修回日期:2019-11-01 出版日期:2020-03-25 发布日期:2020-03-25
  • 基金资助:

    国家自然科学基金(61741212)

A data stream classification algorithm based
on adaptive random forest ensemble model
 

ZHANG Xin-yu,AN Jian-cheng,CAO Rui
 
 
  

  1. (School of Software,Taiyuan University of Technology,Jinzhong 030600,China)
     
     
  • Received:2019-08-04 Revised:2019-11-01 Online:2020-03-25 Published:2020-03-25

摘要:

自适应随机森林分类器在每个基础分类器上分别设置了警告探测器和漂移探测器,实例训练时常常会同时触发多个警告探测器,引起多棵背景树同步训练,使得运行所需的内存大、时间长。针对此问题,提出了一种改进的自适应随机森林集成分类算法,将概念漂移探测器设置在集成学习器端,移除各基础树端的漂移探测器,并根据集成器预测准确率确定需要训练的背景树的数量。用改进后的算法对较平衡的数据流进行分类,在保证分类性能的前提下,与改进前的算法相比,运行时间有所降低,消耗内存有所减少,能更快适应数据流中出现的概念漂移。
 
 

关键词: 数据流, 概念漂移, 随机森林, 漂移探测器, 集成分类器

Abstract:

The adaptive random forest classifier sets a warning detector and a drift detector on each basic classifier. When the instance is being trained, multiple warning detectors are often triggered at the same time, causing multiple background trees to be trained simultaneously, which requires large memory and long running time. Aiming at this problem, this paper proposes an improved adaptive random forest ensemble classification algorithm. It sets the concept drift detector in the ensemble learning device, removes the drift detectors at each base tree, and determines the number of background trees according to the ensemble prediction accuracy. The improved algorithm classifies balanced data streams. Under the premise of ensuring the classification performance, the running time and the memory consumption is reduced, and the concept drift appearing in the data stream can be more quickly adapted.

 

 

 

Key words: data stream, concept drift, random forest, drift detector, ensemble classifier