• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science

Previous Articles     Next Articles

An interpolation based outlier detection
method of sparse high-dimensional data

CHEN Wang-hu,TIAN Zhen,ZHANG Li-zhi,LIANG Xiao-yan,GAO Ya-qiong   

  1. (College of Computer Science & Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2019-04-22 Revised:2019-12-11 Online:2020-06-25 Published:2020-06-25

Abstract:

The data in the outlier detection problem can be considered as the mixture of normal and abnormal points in a space. Under the premise of reducing the loss of normal points, outliers are usually contained in the sample sets farthest from all clustering centroids. Inspired by this idea, this paper proposes an interpolation-based outlier detection method for sparse high-dimensional data. This method interpolates the original data by applying genetic algorithm on the basis of k-means clustering, solving the problem that sparse data in k-means clustering is easy to be merged. Experimental results show that, compared with traditional outlier detection methods based on k-means clustering and several typical detection methods based on improved k-means clustering, the proposed method can not only lose fewer normal points, but also improve the accuracy and precision of detection.
 

Key words: sparse data, outlier detection, interpolation, clustering, genetic algorithm