计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (12): 2206-2215.
陆卫忠1,2,宋正伟1,吴宏杰1,2,曹燕1,丁漪杰1,2 ,张郁3
收稿日期:
2020-04-30
修回日期:
2020-09-08
接受日期:
2021-12-25
出版日期:
2021-12-25
发布日期:
2021-12-31
基金资助:
LU Wei-zhong1,2,SONG Zheng-wei1,WU Hong-jie1,2,CAO Yan1,DING Yi-jie1,2,ZHANG Yu3
Received:
2020-04-30
Revised:
2020-09-08
Accepted:
2021-12-25
Online:
2021-12-25
Published:
2021-12-31
摘要: 行为检测是视频理解与计算机视觉领域炙手可热的研究内容,备受国内外学者的关注,在智能监控、人机交互等多领域被广泛应用。随着科技的进步,深度学习在图像分类领域取得了重大突破,将基于深度学习的识别方法应用于人体行为检测研究已成为行为检测中的热点。基于此,首先对几种常用于行为检测的数据集,及近几年深度学习在行为检测领域的研究现状进行了介绍;接着分析了行为检测方法的基本流程,以及几种常用的基于深度学习的检测方法;最后,从方法性能优劣、应用前景等方面对人体行为检测方法的尚存问题与未来发展趋势进行了分析和展望。
陆卫忠, 宋正伟, 吴宏杰, 曹燕, 丁漪杰, , 张郁. 基于深度学习的人体行为检测方法研究综述[J]. 计算机工程与科学, 2021, 43(12): 2206-2215.
LU Wei-zhong, SONG Zheng-wei, WU Hong-jie, CAO Yan, DING Yi-jie, ZHANG Yu. Overview of human behavior detection methods based on deep learning[J]. Computer Engineering & Science, 2021, 43(12): 2206-2215.
[1] | Ma Yu-xi,Tan Li,Dong Xu,et al.Behavior recognition for intelligent monitoring[J].Journal of Image and Graphics,2019,24(2)282-290.(in Chinese) |
[2] | Mahadevan V,Li W X,Bhalodia V,et al.Anomaly detection in crowded scenes[C]∥Proc of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR),2010:1975-1981. |
[3] | Memo A,Zanuttigh P.Head-mounted gesture controlled interface for human-computer interaction[J].Multimedia Tools & Applications,2016,77(6):27-53. |
[4] | Soomro K,Zamir A R,Shah M.UCF101:A dataset of 101 human actions classes from videos in the wild[J].arXiv:1212.0402,2012. |
[5] | Shahroudy A,Liu J,Ng T-T,et al.NTU RGB+D:A large scale dataset for 3D human activity analysis[C]∥Proc of the 2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:1010-1019. |
[6] | Schmidhuber J. Deep learning in neural networks:An overview[J].Neural Networks,2015,61:85-117. |
[7] | LeCun Y, Bengio Y,Hinton G.Deep learning[J].Nature,2015,521(7553):436. |
[8] | Schuldt C, Laptev I, Caputo B. Recognizing human actions:A local SVM approach[C]∥Proc of the 17th International Conference on Pattern Recognition,2004:32-36. |
[9] | Gorelick L,Blank M,Shechtman E,et al.Actions as space-time shapes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(12):2247-2253. |
[10] | Laptev I, Marszalek M,Schmid C,et al.Learning realistic human actions from movies[C]∥Proc of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, |
20 | 08.doi:10.1109/CVPR.2008.4587756. |
[11] | Rodriguez M D,Ahmed J,Shah M.Action MACH a spatio-temporal maximum average correlation height filter for action recognition[C]∥Proc of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, |
20 | 08. doi:10.1109/CVPR.2008.4587727. |
[12] | Weinland D,Ronfard R,Boyer E.Free viewpoint action re- cognition using motion history volumes[J].Computer Vision and Image Understanding,2006,104(2-3):249-257. |
[13] | Singh S,Velastin S A,Ragheb H.MuHAVi:A multicamera human action video dataset for the evaluation of action re- cognition methods[C]∥Proc of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance,2010:48-55. |
[14] | Yang A Y,Jafari R,Sastry S S,et al.Distributed recognition of human actions using wearable motion sensor networks[J].Journal of Ambient Intelligence and Smart Environments,2009,1(2):103-115. |
[15] | Ellis C,Masood S Z,Tappen M F,et al.Exploring the trade-off between accuracy and observational latency in action re- cognition[J].International Journal of Computer Vision,2013,101(3):420-436. |
[16] | Collins R, Lipton A,Kanade T,et al.A system for video surveillance and monitoring:Technical Report CMU-RI-TR-00-12[R]. |
Pittsburgh:Carnegie Mellon University,2000. | |
[17] | Saligrama V,Chen Z.Video anomaly detection based on local statistical aggregates[C]∥Proc of 2012 IEEE Confe- rence on Computer Vision and Pattern Recognition,2012:2112-2119. |
[18] | Basharat A,Gritai A,Shah M.Learning object motion patterns for anomaly detection and improved object detection[C]∥Proc of the 2008 IEEE Conference on Computer Vision and Pattern Recognition,2008. doi: 10.1109/CVPR.2008.4587510. |
[19] | Zhang F,Wang Y,Zhang Z.View-invariant action recognition in surveillance videos[C]∥Proc of the 1st Asian Conference on Pattern Recognition,2011:580-583. |
[20] | Li K L,Huang H K,Tian S F,et al.Improving one-class SVM for anomaly detection[C]∥Proc of 2003 International Conference on Machine Learning and Cybernetics,2003:3077-3081. |
[21] | Karpathy A, Toderici G,Shetty S,et al.Large-scale video classification with convolutional neural networks[C]∥Proc of 2014 IEEE Conference on Computer Vision & Pattern Recognition,2014:1-8. |
[22] | Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos[C]∥Proc of the 27th International Conference Neural Information Processing Systems,2014:568-576. |
[23] | Vishwakarma D, Kapoor R,Maheshwari R,et al.Recognition of abnormal human activity using the changes in orientation of silhouette in key frames[C]∥Proc of |
20 | 15 2nd International Conference on Computing for Sustainable Global Development,2015:336-341. |
[24] | Donahue J,Hendricks L A,Guadarrama S,et al.Long-term recurrent convolutional networks for visual recognition and description[C]∥Proc of 2015 IEEE Conference on Computer Vision and Pattern Recognition,2015:2625-2634. |
[25] | Ng J Y-H,Hausknecht M,Vijayanarasimhan S,et al.Beyond short snippets:Deep networks for video classification[C]∥Proc of 2015 IEEE Conference on Computer Vision and Pattern Recognition,2015:4694-4702. |
[26] | Gkioxari G,Girshick R,Malik J.Contextual action recognition with R*CNN[C]∥ |
Proc of 2015 IEEE International Conference on Computer Vision,2015:1080-1088. | |
[27] | Chéron G, Laptev I, Schmid C. P-CNN: Pose-based CNN features for action recognition[C]∥Proc of the 2015 IEEE International Conference on Computer Vision(ICCV), 2015:3218-3226. |
[28] | Ramanathan V,Huang J,Abu-El-Haija S,et al.Detecting events and key actors in multi-person videos[C]∥Proc of 2016 IEEE Conference on Computer Vision and Pattern Recognition,2016:3043-3053. |
[29] | Insafutdinov E,Pishchulin L,Andres B,et al.DeeperCut:A deeper,stronger,and faster multi-person pose estimation model[C]∥Proc of European Conference on Computer Vision,2016:34-50. |
[30] | Cao Z,Simon T,Wei S E,et al.Realtime multi-person 2D pose estimation using part affinity fields[C]∥Proc of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017:1302-1310. |
[31] | Ren J,Reyes N H,Barczak A L C,et al.Towards 3D human action recognition using a distilled CNN model[C]∥Proc of IEEE International Conference on Signal & Image Processing,2018:7-12. |
[32] | Ardianto S,Hang H M.Multi-view and multi-modal action recognition with learned fusion[C]∥Proc of Asia-Pacific Signal & Information Processing Association Summit & Conference,2018:1601-1604. |
[33] | Balderas D, Ponce P, Molina A.Convolutional long short term memory deep neural networks for image sequence prediction[J].Expert Systems with Applications,2019,122(5):152-162. |
[34] | Wu Z,Wang X,Jiang Y G,et al.Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[J].IEEE Transactions on Multimedia,2015,20(11):3137-3147. |
[35] | Peng X,Wang L,Wang X,et al.Bag of visual words and fusion methods for action recognition:Comprehensive study and good practice[J].Computer Vision and Image Understanding,2016,150(9):109-125. |
[36] | Wang L,Ge L,Li R,et al.Three-stream CNNs for action recognition[J].Pattern Recognition Letters,2017,92(6):33-40. |
[37] | Mo Hong-wei, Wang Hai-bo.Research on human behavior detection based on Faster R-CNN[J].CAAI Transactions on Intelligent Systems,2018,13(6):967-973.(in Chinese) |
[38] | Tang Hua-dong. Research on event image classification based on LSTM fusion multi CNN[D].Beijing:Beijing Jiaotong University,2018.(in Chinese) |
[39] | Zhou Dao-yang.Human behavior detection based on convolutional neural network[D].Hefei:University of Science and Technology of China,2018.(in Chinese) |
[40] | Yu Xing.Research on video behavior recognition technology based on deep learning[D].Chengdu:University of Electronic Science and Technology of China,2018.(in Chinese) |
[41] | Zhou Z G,Duan G X,Lei H,et al.Human behavior recognition method based on double-branch deep convolution neural network[C]∥Proc of 2018 Chinese Control and Decision Conference (CCDC),2018:5520-5524. |
[42] | Zhang Rui, Li Qi-shen,Chu Jun.Human motion recognition algorithm based on 3D CNN[J].Computer Engineering,2019,45(1):259-263.(in Chinese) |
[43] | McKenna S J, Jabri S,Duric Z,et al.Tracking groups of people[J].Computer Vision and Image Understanding,2010,80(1):42-56. |
[44] | Wang J, He H M. ARM-based embedded video monitoring system research [C]∥Proc of 2010 3rd International Conference on Computer Science and Information Technology,2010:677-679. |
[45] | Tian D P. A review on image feature extraction and representation techniques[J]. International Journal of Multi- media and Ubiquitous Engineering,2013,8(4):385-395. |
[46] | Rabiner L R.A tutorial on hidden Markov models and selected applications in speech recognition[J].Proceedings of the IEEE,1989,77(2):257-286. |
[47] | Feichtenhofer C, Pinz A, Wildes R P.Spatiotemporal resi- dual networks for video action recognition[C]∥ |
Proc of the 30th International Conference on Neural Information Processing Systems,2016:3476-3484. | |
[48] | Ji S W,Xu W,Yang M,et al.3D convolutional neural networks for human action recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):221-231. |
[49] | Chen Yu-ping, Qiu Wei-gen.Human behavior recognition based on CNN/LSTM and sparse down sampling[J].Computer Engineering and Design,2019,40(5):1445-1450.(in Chinese) |
附中文参考文献: | |
[1] | 马钰锡,谭励,董旭,等.面向智能监控的行为识别[J].中国图象图形学报,2019,24(2):282-290. |
[37] | 莫宏伟,汪海波.基于Faster R-CNN的人体行为检测研究[J].智能系统学报,2018,13(6):967-973. |
[38] | 汤华东. 基于LSTM融合多CNN的事件图像分类研究[D].北京:北京交通大学,2018. |
[39] | 周道洋.基于卷积神经网络的人体行为检测研究[D].合肥:中国科学技术大学,2018. |
[40] | 余兴.基于深度学习的视频行为识别技术研究[D].成都:电子科技大学,2018. |
[42] | 张瑞,李其申,储珺.基于3D卷积神经网络的人体动作识别算法[J].计算机工程,2019,45(1):259-263. |
[49] | 陈煜平,邱卫根.基于CNN/LSTM和稀疏下采样的人体行为识别[J].计算机工程与设计,2019,40(5):1445-1450. |
[1] | 马思远, 焦佳辉, 任晟岐, 宋伟. 基于注意力机制的城市多元空气质量数据缺失值填充[J]. 计算机工程与科学, 2023, 45(08): 1354-1364. |
[2] | 易啸, 马胜, 肖侬. 深度学习加速器在不同剪枝策略下的运行优化[J]. 计算机工程与科学, 2023, 45(07): 1141-1148. |
[3] | 康宇晗, 时洋, 陈照云, 文梅. 面向迈创+MatrixZone异构系统的深度学习编程框架[J]. 计算机工程与科学, 2023, 45(07): 1149-1158. |
[4] | 刘浩翰, 孙铖, 贺怀清, 惠康华. 基于改进YOLOv3的金属表面缺陷检测[J]. 计算机工程与科学, 2023, 45(07): 1226-1235. |
[5] | 田秀霞, 刘正, 刘秋旭, 李浩然. 一种改进Faster R-CNN的图像篡改检测模型[J]. 计算机工程与科学, 2023, 45(06): 1030-1039. |
[6] | 濮子俊, 张寿明. 基于特征融合与Transformer模型的声音事件定位与检测算法研究[J]. 计算机工程与科学, 2023, 45(06): 1097-1105. |
[7] | 邓姗姗, 黄慧, 马燕. 基于改进Faster R-CNN的小目标检测算法[J]. 计算机工程与科学, 2023, 45(05): 869-877. |
[8] | 霍爱清, 张书涵, 杨玉艳, 胥静蓉, 王泽文. 密集交通场景中改进YOLOv3目标检测优化算法[J]. 计算机工程与科学, 2023, 45(05): 878-884. |
[9] | 方雪杉, 杨云飞, 冯松. 基于多尺度多模态学习的光球亮点曲线轨迹段检测方法研究[J]. 计算机工程与科学, 2023, 45(05): 885-894. |
[10] | 黄星威, 陈曦, 张塑凡. 改进特征金字塔的小目标深度学习模型[J]. 计算机工程与科学, 2023, 45(04): 734-742. |
[11] | 董佩杰, 牛新, 魏自勉, 陈学晖. 单次神经网络结构搜索研究综述[J]. 计算机工程与科学, 2023, 45(02): 191-203. |
[12] | 强梓林, 刘建国, 刘云峰, 卫栋, 强彦. 基于时域-频域哈希编码的电网图像检索方法[J]. 计算机工程与科学, 2022, 44(10): 1877-1884. |
[13] | 何平, 李刚, 李慧斌, . 基于深度学习的视频异常检测方法综述[J]. 计算机工程与科学, 2022, 44(09): 1620-1629. |
[14] | 刘从军, 徐佳陈, 肖志勇, 柴志雷. 基于深度学习的心脏核磁共振图像自动分割算法[J]. 计算机工程与科学, 2022, 44(09): 1646-1654. |
[15] | 何涛, 施慧莉, 李大亮. 基于深度学习的SAR目标识别DSP设计[J]. 计算机工程与科学, 2022, 44(08): 1357-1363. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
湘公网安备 43010502000083号
湘ICP备10006030号
版权所有 © 《计算机工程与科学》 编辑部
地址:中国湖南省长沙市开福区德雅路109号(410073) 电话:0731-87002567 Email: jsjgcykx@vip.163.com
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn