• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (01): 75-85.

• 计算机网络与信息安全 • 上一篇    下一篇

基于网格密度积叠的流数据异常检测

武培成,赵旭俊,靳黎忠   

  1. (太原科技大学计算机科学与技术学院,山西 太原 030024)
  • 收稿日期:2023-06-15 修回日期:2023-10-30 接受日期:2025-01-25 出版日期:2025-01-25 发布日期:2025-01-18
  • 基金资助:
    国家自然科学基金(61572343,U1931209);山西省应用基础研究计划(20210302123223,202103021224275)

Anomaly detection of stream data based on grid density stacking

WU Peicheng,ZHAO Xujun,JIN  Lizhong   

  1. (College of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024,China)
  • Received:2023-06-15 Revised:2023-10-30 Accepted:2025-01-25 Online:2025-01-25 Published:2025-01-18

摘要: 多数的流数据异常检测算法采用滑动的单一窗口模型,这会导致大量数据点进行重复计算,异常点也会受到滑动窗口中近邻更替的干扰,进而影响异常检测算法的准确性。为解决上述问题,提出了联合窗口模型,采用若干无重叠的窗口作为异常点的检测范围。在此模型上,提出了基于网格密度积叠的异常检测算法,首先,优化了核密度估计函数用于数据点局部密度的计算;其次,提出网格密度积叠操作,用于异常网格的度量。在异常网格中,通过计算数据点的异常分数来确定最终异常数据。为了提高算法效率,提出一种自适应剪枝策略,剪枝一些异常点不可能出现的区域。实验结果表明,该算法同现有的数据流异常检测算法相比,在效率和准确性2个方面体现出较强的优势。

关键词: 异常检测, 流数据, 核密度估计, 网格密度积叠

Abstract: Most of the stream data anomaly detection algorithms employ a sliding single-window model, which leads to redundant calculations for a large number of data points and disturbs anomaly points due to the replacement of neighbors in the sliding window, thereby affecting the accuracy of anomaly detection algorithms. To address these issues, a combined window model is proposed, which utilizes several non-overlapping windows as the detection range for anomaly points. Based on this model, an anomaly detection algorithm based on grid density accumulation is introduced. Firstly, the kernel density estimation function is optimized and used to calculate the local density of data points. Then, a grid density accumulation operation is proposed to measure anomalous grids. In anomalous grids, the final anomalous data is determined by calculating the anomaly scores of data points. To improve the algorithm's efficiency, an adaptive pruning strategy is proposed to prune areas where anomaly points are unlikely to appear. Experimental results show that this algorithm exhibits significant advantages in both efficiency and accuracy compared to existing stream data anomaly detection algorithms.

Key words: anomaly detection, stream data, kernel density estimation, grid density stacking