• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 人工智能与数据挖掘 • 上一篇    下一篇

基于改进的OPTICS聚类和LOPW的离群数据检测算法

肖雪,薛善良   

  1. (南京航空航天大学计算机科学与技术学院,江苏 南京 210016)
  • 收稿日期:2018-06-05 修回日期:2018-08-15 出版日期:2019-05-25 发布日期:2018-05-25

An outlier detection algorithm based on
 improved OPTICS clustering and LOPW

XIAO Xue,XUE Shanliang   

  1. (College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 210016,China)
     
  • Received:2018-06-05 Revised:2018-08-15 Online:2019-05-25 Published:2018-05-25

摘要:

针对现有的离群数据检测算法时间复杂度过高,且检测质量不佳的不足,提出一种新的基于改进的OPTICS聚类和LOPW的离群数据检测算法。首先,使用改进的OPTICS聚类算法对原始数据集进行预处理,筛选由聚类形成的可达图得到初步离群数据集;然后,利用新定义的基于P权值的局部离群因子LOPW计算初步离群数据集中对象的离群程度,计算距离时引入去一划分信息熵增量确定属性的权重,提高离群检测准确性。实验结果表明,改进后的算法不仅提高了运算效率,而且提高了对离群数据检测的精确度。

关键词: LOF算法, 离群数据检测, OPTICS聚类, 信息熵, 加权距离

Abstract:

Aiming at the problems of the high time complexity and poor detection quality of current outlier detection algorithms, we propose a new outlier detection algorithm based on the improved OPTICS clustering and LOPW. Firstly, the original data set is preprocessed by the improved OPTICS clustering algorithm and the preliminary outlier dataset is obtained by filtering the reachability graph of clustering results. Then, we use the newly defined local outlier factor based on Pweight (LOPW) to calculate the degree of outliers of the objects in the primary outlier dataset. When distances calculated, the leaveone partition information entropy gain is introduced to determine the weight of features, thus improving the precision of outlier detection. Experimental results show that the improved algorithm can improve the computational efficiency and the precision of outlier detection.

 

Key words: LOF algorithm, outlier detection, OPTICS clustering, information entropy, weighted distance