• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• • 上一篇    下一篇

基于条件扩散概率模型的视频异常检测

叶亚琴, 汤子健, 牛嘉诚, 张新欢


  

  1. (1. 中国地质大学(武汉)计算机学院,武汉 430078;
    2. 中国地质大学(武汉)湖北省智能地理信息处理重点实验室,武汉 430078)

  • 出版日期:2025-06-12 发布日期:2025-06-12

Yaqin Ye, Zijian Tang , Jiacheng Niu, Xinhuan Zhang   

  1. (1. School of Computer Science, China University of Geosciences, Wuhan, 430078, China;
    2. Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, 430078, China)
  • Online:2025-06-12 Published:2025-06-12

摘要: 视频异常检测在现代社会中越来越重要,由于视频中存在多样的模糊行为,并且类别无法穷举,基于单分类的方法难以界定正常和异常。针对以上问题,本文提出了基于条件扩散概率模型的视频异常检测模型CDiffuVAD。该方法首先设计了一个常态内容生成器,用于提高模型生成图像的内容准确程度。它通过记忆池增强模型对正常样本分布特征的理解,并借助扩散概率模型来学习视频数据的复杂分布。其次,设计引入了隐式运动条件来学习视频片段的时空特征,将双向光流信息作为扩散过程的隐式运动条件,并采用坐标归一化方法提供片段帧的坐标嵌入,从而实现对多帧序列数据中的运动趋势的学习拟合,缓解模型对视频中的硬正态信息敏感。最终实验表明,所提出的方法分别在Avenue数据集、ShanghaiTech数据集和UBnormal数据集上达到帧级AUC 85.7%,75.5%以及65.7%的精度,表明其可以发现正常样本中多样的特征,并在视频异常检测任务上具有有效性。

关键词: 视频异常检测, 扩散模型, 不确定性, 记忆池, 生成模型

Abstract: Video anomaly detection is becoming more and more important in modern society. Because there are various fuzzy behaviors in videos and the categories can not be exhaustive, it is difficult to define normal and abnormal based on single-classification methods. In response to the aforementioned issues, a Conditional Diffusion Probabilistic Model for Video Anomaly Detection (CDiffuVAD) is proposed. Firstly, a normal content generator is designed to improve the content accuracy of the image generated by the model. It highlights the distribution pattern of normal samples through a memory pool and leverages the diffusion probability model to learn the complex distribution of video data. Secondly, implicit motion conditions are designed and introduced to learn the spatiotemporal features of video segments. The bidirectional optical flow information is used as the implicit motion condition of the diffusion process, and the coordinate normalization method is used to provide the coordinate embedding of the segment frame, so as to realize the learning and fitting of the motion trend in the multi-frame sequence data and alleviate the sensitivity of the model to the hard normal information in the video. Finally, experiments that the proposed method achieves 85.7%, 75.5% and 65.7% accuracy of frame-level AUC on Avenue dataset, ShanghaiTech dataset and UBnormal dataset respectively, indicating that it can find diverse features in normal samples and the effectivity in video anomaly detection tasks. 

Key words: video anomaly detection, diffusion model, uncertainty, memory pool, generative model