• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (07): 1216-1225.

• 图形与图像 • 上一篇    下一篇

融合Mask R-CNN的在线多目标行人跟踪方法

曹玉东1,陈冬昊1,曹睿2,赵朗1   

  1. (1.辽宁工业大学电子与信息工程学院,辽宁 锦州 121001;2.大连交通大学自动化与电气工程学院,辽宁 大连 116028)
  • 收稿日期:2021-11-09 修回日期:2022-05-04 接受日期:2023-07-25 出版日期:2023-07-25 发布日期:2023-07-11

An online multi-pedestrian tracking method with Mask R-CNN

CAO Yu-dong1,CHEN Dong-hao1,CAO Rui2,ZHAO Lang1   

  1. (1.School of Electronics and Information Engineering,Liaoning University of Technology,Jinzhou 121001;
    2.School of Automation and Electrical Engineering,Dalian Jiaotong University,Dalian 116028,China)
  • Received:2021-11-09 Revised:2022-05-04 Accepted:2023-07-25 Online:2023-07-25 Published:2023-07-11

摘要: 在计算机视觉领域中,行人目标检测与跟踪是备受关注的焦点。提出一种改进的多目标行人跟踪模型,改进Deep SORT基础框架,融合Mask R-CNN实现行人的检测、跟踪和姿态估计功能。采用更符合行人目标宽高比的锚框替代区域预测网络中的锚框,在不增加计算量的情况下提高模型的性能。在深度残差网络中引入注意力机制,采用轻量级的SKNet自适应地选取最佳的卷积核,提高对检测目标的特征表示能力。采用融合了颜色信息的梯度直方图特征取代卷积特征,提高Deep SORT模型中外观信息特征关联匹配的成功率。通过消融研究验证各种改进对模型性能的影响,将改进的模型与当前主流的行人检测跟踪模型进行对比,实验结果表明改进的模型是有效的,在MOT16跟踪数据集上比NSH的MOTA性能提高了6%,在公开数据集上的测试性能优于几种对比模型的,当背景移动或行人目标被遮挡时,仍能实现有效跟踪。

关键词: 行人检测, 行人跟踪, 姿态估计

Abstract: Pedestrian object detection and tracking have attracted much attention in the computer vision field. An improved multi-pedestrian tracking model is proposed, which improves the basic framework of Deep SORT and integrates Mask R-CNN to realize the detection, tracking and pose estimation of pedestrian. The anchor boxes with the more suitable aspect ratio for pedestrian target are adopted, which replace the anchor boxes of RPN to speed up the model and improve performance without complex calculation. In addition, attention mechanism is introduced into the deep residual network, i.e., the lightweight SKNet is used to choose the best convolution kernel adaptively to improve the feature representation for target detection. The histogram of gradient feature combined with color information is adopted instead of the convolution feature, which improves appearance feature association matching in the Deep SORT model so as to track pedestrian targets effectively under occlusion. The impact of various improvements on the model are verified through ablation studies, and the proposed model is compared with the current mainstream model. Experimental results show that the improved models are effective, which improves MOTA of NSH by 6% on the MOT16 tracking data set. The test performance of our proposal on the public datasets is superior to that of the compared models. The proposed model can still track pedestrian targets effectively when the background moves or pedestrian targets are occluded.  

Key words: pedestrian detection, pedestrian tracking, pose estimation