• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 高性能计算 • 上一篇    下一篇

基于网格耦合的数据流异常检测

杨杰,张东月,周丽华,黄皓,丁海燕   

  1. (云南大学信息学院,云南 昆明 650504)
  • 收稿日期:2019-09-09 修回日期:2019-11-04 出版日期:2020-01-25 发布日期:2020-01-25
  • 基金资助:

    国家自然科学基金(61762090,61966036,61662086);云南省自然科学基金(2016FA026);云南省创新研究团队项目(2018HC019);国家社会科学基金(18XZZ005);云南省高等学校科技创新团队项目(IRTSTYN)

Detecting outliers in data stream based on grid coupling 

YANG Jie,ZHANG Dong-yue,ZHOU Li-hua,HUANG Hao,DING Hai-yan     

  1.   (School of Information Science & Engineering,Yunnan University,Kunming 650504,China)
  • Received:2019-09-09 Revised:2019-11-04 Online:2020-01-25 Published:2020-01-25

摘要:

基于网格的数据分析方法以网格为单位处理数据,避免了数据对象点对点的计算,极大提高了数据分析的效率。但是,传统基于网格的方法在数据分析过程中独立处理网格,忽略了网格之间的耦合关系,影响了分析的精确度。在应用网格检测数据流异常的过程中不再独立处理网格,而是考虑了网格之间的耦合关系,提出了一种基于网格耦合的数据流异常检测算法GCStream-OD。该算法通过网格耦合精确地表达了数据流对象之间的相关性,并通过剪枝策略提高算法的效率。在5个真实数据集上的实验结果表明,GCStream-OD算法具有较高的异常检测质量和效率。

关键词: 异常检测, 数据流, 网格耦合

Abstract:

The grid-based data analysis method processes data in units of grids, avoiding the point-to-point calculation of data objects and greatly improving the efficiency of data analysis. However, the traditional grid-based method processes the grid independently in the analysis process, ignoring the coupling relationship between the grids and resulting in unsatisfactory analysis accuracy. In this paper, the grids are no longer processed independently and the coupling relationship between grids are considered, when the grids are used to detect outliers in data stream. A grid coupling based outliers detection algorithm for data streams (GCStream-OD) is proposed. The algorithm exactly expresses the correlation between data stream objects through grid coupling, and improves the efficiency of the algorithm through pruning strategy. Experimental results on five real data streams show that GCStream-OD has higher quality and efficiency of outliers detection.
 

Key words: outliers detection, data stream, grid coupling