• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

Parallel anomaly detection based on Isolation Forest

HOU Yongxu1,DUAN Lei1,2,QIN Jianglong3,QIN Pan1,TANG Changjie1
  

  1. (1.School of Computer Science,Sichuan University,Chengdu 610065;
    2.West China School of Public Health,Sichuan University,Chengdu 610041;
    3.School of Software,Yunnan University,Kunming 650091,China) 
  • Received:2016-09-11 Revised:2016-11-05 Online:2017-02-25 Published:2017-02-25

Abstract:

Anomaly detection, which is used in a variety of applications, attracts attention both in industry and academia. Among numerous methods for anomaly detection, the Isolation Forest algorithm, whose characteristics include high efficiency, sound detection accuracy, has wide realworld applications. However, the conventional Isolation forest algorithm can hardly deal with largescale data sets. To break this limitation, we propose a cloud computing platform based algorithm. Specifically, we design and implement a parallel algorithm for anomaly detection based on Isolation Forest, named PIFH,using the Hadoop distributed storage system and the MapReduce distributed computational framework. By parallelizing the processes of detection model construction and anomaly evaluation, its efficiency is improved, and the application range is also extended. Experiments using realworld data sets demonstrate that the proposed algorithm is efficient and scalable.
 

Key words: anomaly detection, cloud computing, parallelization