• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2020, Vol. 42 ›› Issue (08): 1440-1447.

• 人工智能与数据挖掘 • 上一篇    下一篇

基于自编码器和集成学习的半监督异常检测算法

夏火松,孙泽林   

  1. (武汉纺织大学管理学院,湖北 武汉 430073)
  • 收稿日期:2019-12-31 修回日期:2020-03-11 接受日期:2020-08-25 出版日期:2020-08-25 发布日期:2020-08-29
  • 基金资助:
    国家自然科学基金(71871172,71571139)

A semi-supervised outlier detection model based on autoencoder and integrated learning

XIA Huo-song,SUN Ze-lin   

  1. (School of Management,Wuhan Textile University,Wuhan 430073,China)

  • Received:2019-12-31 Revised:2020-03-11 Accepted:2020-08-25 Online:2020-08-25 Published:2020-08-29

摘要: 异常检测用来预处理数据,挖掘异类数据信息,是数据挖掘的一种重要方法。近年来由于维度灾难问题,高维异常数据检测显得十分困难,针对上述问题提出一种基于自编码器和集成学习的半监督异常检测算法。首先利用自编码器降维,在编解码过程中异常数据的异常程度被增大,然后在AdaBoost提升框架中融合iforest、LOF、K-means算法,基于3种算法对于不同异常类型的敏感性,提升异常检测的准确性。选取UCI机器学习库中的高维异常数据集进行实验。实验结果表明,该模型的准确性相较于目前主流的异常检测算法有显著提升。


关键词: 异常检测, 提升框架, 半监督, 自编码器

Abstract: Outlier detection is an important data mining method, which is used to preprocess data and mine heterogeneous data information. In recent years, due to the problem of dimension disaster, it is very difficult to detect the high-dimensional outlier data. Aiming at the above problems, a semi- supervised outlier detection model based on autoencoder and integrated learning is proposed. Firstly, autoencoder is used to reduce the dimension and increase the outlier degree of the outlier data. Secondly, considering that Iforest, lof and k-means algorithms are sensitive to different outlier types, they are fused in the AdaBoost boosting framework to improve the accuracy of outlier detection. The results show that, compared with the current mainstream outlier detection methods, the proposal significantly improves the accuracy of the model.

Key words: outlier detection, boosting framework, semi-supervised;autoencoder