• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

J4 ›› 2010, Vol. 32 ›› Issue (9): 130-133.doi: 10.3969/j.issn.1007130X.2010.

• 论文 • 上一篇    下一篇

基于网格相邻关系的离异点识别算法

李光兴1,2,杨燕2   

  1. (1.成都农业科技职业学院基础部,四川 成都 611130;2.西南交通大学信息科学与技术学院,四川 成都 610031)
  • 收稿日期:2010-03-13 修回日期:2010-06-10 出版日期:2010-09-02 发布日期:2010-09-02
  • 作者简介:李光兴(1956),男,四川成都人,副教授,研究方向为数据挖掘;杨燕,博士,教授,研究方向为计算智能与数据挖掘。

An Outlier Recognition Algorithm Based on Grid Adjacency Relation

LI Guangxing1,2,YANG Yan2   

  1. (1.Department of Fundamental Courses,Chengdu Vocational College of Agricultural Science and Technology,Chengdu 611130;2.School of Information Science and Technology,Southwest Jiaotong University,Chengdu 610031,China)
  • Received:2010-03-13 Revised:2010-06-10 Online:2010-09-02 Published:2010-09-02

摘要:

离异点是偏离部分观察对象的数据点,根据离异点所在单元的密度与相邻单元的密度相比可能偏高或偏低的特点,本文提出了基于网格相邻关系的离异点识别算法GAO。该算法用单元间的相对密度和单元质心距离来衡量单元间的离异度,根据离异度确定离异单元和离异点。实验结果表明,该算法能有效地识别出多密度数据集的离异点,算法的效率优于Cellbased算法,且适合大数据集的离异点识别。

关键词: 相邻单元, 相异函数, 离异点

Abstract:

Outliers are the deviation objects of data points. The paper presents an outlier recognition algorithm based on grid adjacency relation (GAO), according to the high or low density of the outlier unit comparing to its neighborhood. The outlier and the outlier unit are determined by the degree of deviation, which is measured by the relative density and distance of the center of mass between units. The experimental results show that the algorithm can recognize the outlier of multidensity, highdimensional and large data sets effectively. The algorithm’s efficiency is better than that of the Cellbased algorithms.

Key words: adjacent units;diversity function;outlier