• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (10): 1814-1821.

• 图形与图像 • 上一篇    下一篇

基于改进3D ResNet的视频人体行为识别方法研究

牛为华1,2,翟瑞冰1   

  1. (1.华北电力大学计算机系,河北 保定 071003;
    2.华北电力大学复杂能源系统智能计算教育部工程研究中心,河北 保定 071003)
  • 收稿日期:2022-08-07 修回日期:2022-11-26 接受日期:2023-10-25 出版日期:2023-10-25 发布日期:2023-10-17

A video human behavior recognition method based on improved 3D ResNet

NIU Wei-hua1,2,ZHAI Rui-bing1   

  1. (1.Department of Computer,North China Electric Power University,Baoding 071003;
    2.Engineering Research Center of Intelligent Computing for Complex Energy Systems,
    Ministry of Education,North China Electric Power University,Baoding 071003,China)
  • Received:2022-08-07 Revised:2022-11-26 Accepted:2023-10-25 Online:2023-10-25 Published:2023-10-17

摘要: 针对人体行为在视频中呈现的时序性特点,提出了一种融合非对称卷积和CBR模块的视频人体行为识别方法。该方法使用3D ResNet-50作为主干网络。首先,将网络中较大的卷积核拆分为2个非对称3D卷积核的串联,加深卷积层在水平和竖直方向上的局部关键特征提取;其次,加入了CBR模块,以增加网络层数。该网络对连续的视频帧序列进行图像和时序的多角度特征提取,并根据特征数据对其进行分类,最后输出识别结果。在基准数据集UCF101上的大量实验结果表明,所提方法的Top1准确率和Top5准确率与原始3D ResNet网络相比分别提升了4.03%和4.99%,且该方法的识别准确率也优于其他主流方法的识别准确率。

关键词: 人体行为识别, 3D卷积, 3D ResNet网络, 非对称卷积, UCF101数据集

Abstract: Aiming at the temporal characteristics of human behavior in videos, a video human beha- vior recognition method is proposed that combines asymmetric convolution and CBR modules. This method uses 3D ResNet-50 as the backbone network. First, the larger convolutions in the network are changed to the concatenation of two asymmetric 3D convolutions, which deepens the local key feature extraction of the convolution layer in the horizontal and vertical directions. Secondly, CBR module is added to improve the number of network layers. The network extracts multi-angle features of images and time series from continuous video frame sequences, classifies them according to the feature data, and finally outputs the recognition results. Extensive experimental results on the benchmark dataset UCF101 show that the Top1 and Top5 accuracy of the proposed method are improved by 4.03% and 4.99%, respectively, compared with the original 3D ResNet network, and the recognition accuracy of this method is also better than other mainstream methods.

Key words: human behavior recognition, 3D convolution, 3D ResNet, asymmetric convolution, UCF101 dataset