• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2025, Vol. 47 ›› Issue (10): 1830-1840.

• 图形与图像 • 上一篇    下一篇

基于时空图注意力状态空间模型的人体姿态异常检测研究

李航,陈志刚,王易杰,张心宇,雷惊鸿,刘凌枫   

  1. (1.中南大学计算机学院,湖南 长沙 410083;2.中南大学大数据研究院,湖南  长沙 410083)

  • 收稿日期:2024-09-13 修回日期:2024-11-14 出版日期:2025-10-25 发布日期:2025-10-29
  • 基金资助:
    长沙市科技计划重大专项基金 (kh2103016)

Research on human pose anomaly detection based on spatio temporal graph attention state space model

LI Hang,CHEN Zhigang,WANG Yijie,ZHANG Xinyu,LEI Jinghong,LIU Lingfeng   

  1. (1.School of Computer Science and Engineering,Central South University,Changsha  410083;
    2.Big Data Institute of Central South University,Changsha  410083,China)
  • Received:2024-09-13 Revised:2024-11-14 Online:2025-10-25 Published:2025-10-29

摘要: 视频异常检测在公共安防、交通和医疗等领域应用广泛,人体姿态异常检测存在易受环境影响、骨架时序难处理、计算复杂度高和运动区域的局部重要特征易忽略等问题。为解决上述问题,提出了一种新的基于人体骨架的时空图正则化流混合注意力状态空间模型STG-FAM。该模型通过在时空图卷积网络中引入选择性状态空间模型和正则化流,有效捕获骨架时序中的时间动态特征,利用混合注意力机制学习跨通道域和空间域的注意力权重,增强模型对时序骨架关键节点与时空边的关注,提升模型表征能力和异常检测能力。通过在2个视频异常检测数据集ShanghaiTech Campus和UBnormal上进行验证,表明了所提模型的有效性。

关键词: 视频异常检测, 人体骨架, 图神经网络, 状态空间模型, 注意力机制

Abstract: Video anomaly detection is widely applied in fields such as public security, transportation, and healthcare. However, human pose anomaly detection faces issues including susceptibility to environmental influences, difficulty in handling skeleton timelines, high computational complexity, and easy neglect of local important features in motion regions. To address these problems, a novel model  based on human skeleton, named spatiotemporal graph normalizing flow mixed attention state space model (STG-FAM), is proposed. This model effectively captures temporal dynamic features in skeleton timelines by introducing a selective state space model and normalizing flow into the spatiotemporal graph convolutional network. It utilizes a mixed attention mechanism to learn attention weights across channels and spatial domains, thereby enhancing the model’s focus on key nodes and spatiotemporal edges in the temporal skeleton and improving the model’s representational capacity and anomaly detection performance. The effectiveness of the proposed model is demonstrated  through experiments on two video anomaly detection datasets: the ShanghaiTech Campus dataset and the UBnormal dataset.

Key words: video anomaly detection, human skeleton, graph neural network, state space model, attention mechanism