A new outlier detection method based on large data

Computer Engineering & Science

Previous Articles Next Articles

A new outlier detection method based on large data

YANG Xiansheng1,JIANG Lei1,PENG Xiong2,ZHOU Qian1,LIU Jujun1

(1.Key Laboratory of Knowledge Processing & Network Manufacturing,

Hunan University of Science & Technology,Xiangtan 411201;

2.BBK Commercial Chain Co.,Ltd,Xiangtan 411000,China)

Received:2018-01-16 Revised:2018-04-01 Online:2018-07-25 Published:2018-07-25

Abstract

Abstract:

Outlier detection, whose aim is to find abnormal data from the massive data, has two advantages. First, as a way of data preprocessing, it can reduce the impact of noise on the model. Second, in a specific scene, it can find outliers accurately and analyze the abnormal phenomenon. At present, domestic and foreign mainstream methods, such as KNN and ORCA etc., do not take the global outliers, local outliers and outlier cluster into account, and it is difficult for them to deal with largescale data sets. Based on the Spark platform, we propose a new outlier detection model. In order to maximize the overall detection results, iForest, LOF, and DBSCAN are used respectively for their high sensitivity. First, the three specific base classifiers are selected, and their object functions are changed. Then, the error rate calculation method of the framework is modified, improved and merged to form a new outlier detection model,called ILDBOOST. The results show that the model fully takes into account the detection of global, local outliers and outlier cluster, which improves the precision and recall rate as a whole, and the effect is obviously better than the current mainstream outliers detection methods.

Key words: outlier detection, blending and stacking, business big data, boosting frame

YANG Xiansheng1,JIANG Lei1,PENG Xiong2,ZHOU Qian1,LIU Jujun1. A new outlier detection method based on large data[J]. Computer Engineering & Science.

[1]	TANG Yu, DAI Qi, YANG Meng-yuan, CHEN Li-fang, . An improved sparrow search algorithm to optimize SVM for outlier detection [J]. Computer Engineering & Science, 2023, 45(02): 346-354.
[2]	XIA Huo-song, SUN Ze-lin. A semi-supervised outlier detection model based on autoencoder and integrated learning [J]. Computer Engineering & Science, 2020, 42(08): 1440-1447.
[3]	CHEN Wang-hu,TIAN Zhen,ZHANG Li-zhi,LIANG Xiao-yan,GAO Ya-qiong. An interpolation based outlier detection method of sparse high-dimensional data [J]. Computer Engineering & Science, 2020, 42(06): 966-972.
[4]	XIAO Xue,XUE Shanliang. An outlier detection algorithm based on improved OPTICS clustering and LOPW [J]. Computer Engineering & Science, 2019, 41(05): 885-892.
[5]	YU Jia-zong，LIU Bo. Attribute outlier detection based on particle swarm optimization [J]. Computer Engineering & Science, 2017, 39(06): 1186-1192.
[6]	WU Zhongbo,YUAN Lei,WANG Min. Local Outlier Detection Algorithm in Sensor Network [J]. J4, 2012, 34(10): 43-47.
[7]	. [J]. J4, 2007, 29(11): 76-77.

A new outlier detection method based on large data

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 7

Recommended Articles

Metrics

Comments