• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (01): 130-139.

• 图形与图像 • 上一篇    下一篇

改进注意力混合自动编码器视频异常检测研究

陈兆波1,张琳1,2,马晓轩1   

  1. (1.北京建筑大学电气与信息工程学院,北京 102616;
    2.北京建筑大学建筑大数据智能处理方法研究北京市重点实验室,北京 102616)
  • 收稿日期:2023-08-22 修回日期:2024-03-12 接受日期:2025-01-25 出版日期:2025-01-25 发布日期:2025-01-18
  • 基金资助:
    北京市教育科学“十三五”规划重点课题(CHAA19081)

Video anomaly detection with improved attention hybrid auto-encoder

CHEN Zhaobo1,ZHANG Lin1,2,MA Xiaoxuan1   

  1. (1.School of Electrical and Information Engineering,Beijing University of Civil Engineering and Architecture,Beijing 102616;
    2.Beijing Key Laboratory of Intelligent Processing for Building Big Data,
    Beijing University of Civil Engineering and Architecture,Beijing 102616,China)
  • Received:2023-08-22 Revised:2024-03-12 Accepted:2025-01-25 Online:2025-01-25 Published:2025-01-18

摘要: 视频异常检测是计算机视觉领域的重要研究内容之一,广泛应用于交通、公共安全等领域。然而,目前视频异常检测领域存在单个预测模型易受噪声干扰、单个重构模型存在泛化异常等问题。为了解决这些问题,提出了一种结合重构和预测模型的视频异常检测方法。在正常光流数据上训练具有注意力机制和内存增强模块的重构网络,再将重构后的光流和原始视频帧同时输入未来帧预测网络中,以重构光流为条件辅助帧预测网络更好地生成未来帧。为了提取更有效的特征,提出了一种残差卷积注意力模块SRCAM以促进重构和预测网络在全局和局部层面有效学习潜在空间的特征表示,从而增强模型对视频中异常事件的检测能力,提高模型的鲁棒性。通过在UCSD Ped2和CUHK Avenue这2个常用的视频异常检测数据集上进行的广泛的实验评估,表明了所提方法的有效性。

关键词: 视频异常检测, 注意力机制, 流重构, 帧预测, 自动编码器

Abstract: Video anomaly detection is one of the important research areas in computer vision, widely applied in fields such as transportation and public safety. However, the current field of video anomaly detection faces issues such as susceptibility to noise interference in individual prediction models and generalization anomalies in individual reconstruction models. To address these problems, a video anomaly detection method combining reconstruction and prediction models is proposed. A reconstruction network with an attention mechanism and a memory enhancement module is trained on normal optical flow data. The reconstructed optical flow and original video frames are then simultaneously input into a future frame prediction network, where the reconstructed optical flow serves as a conditional aid to assist the frame prediction network in better generating future frames. To extract more effective features, a residual convolutional attention module (SRCAM) is proposed to facilitate the reconstruction and prediction networks in effectively learning feature representations of latent spaces at both global and local levels, thereby enhancing the model's ability to detect anomalous events in videos and improving its robustness. Extensive experimental evaluations on two commonly used video anomaly detection datasets, UCSD Ped2 and CUHK Avenue, demonstrate the effectiveness of the proposed method.

Key words: video anomaly detection, attention mechanism, stream reconstruction, frame prediction, auto-encoder