• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 图形与图像 • 上一篇    下一篇

基于多尺度特征融合的小目标行人检测

张思宇1,2,张轶1,2   

  1. (1.四川大学计算机学院,四川  成都  610065;
    2.四川大学视觉合成图形图像技术国家重点学科实验室,四川 成都  610065)
  • 收稿日期:2019-01-25 修回日期:2019-04-24 出版日期:2019-09-25 发布日期:2019-09-25

Small target pedestrian detection
based on multi-scale feature fusion
 

ZHANG Si-yu1,2,ZHANG Yi1,2   

  1. (1.College of Computer Science,Sichuan University,Chengdu 610065;
    2.National Key Laboratory of Fundamental Science on Synthetic Vision,Sichuan University,Chengdu 610065,China)
  • Received:2019-01-25 Revised:2019-04-24 Online:2019-09-25 Published:2019-09-25

摘要:

针对SSD当前存在的小目标漏检以及误检问题,结合反卷积与特征融合思想,提出hgSSD模型。将原SSD特征层反卷积后与较浅层特征结合,实现复杂场景下小目标行人检测。为了保留浅层网络特征,提高算法实时性,节省计算资源,hgSSD模型基础网络使用VGG16,而非更深层的ResNet101。为了加强对小目标的检测,将VGG16中的Conv3_3改进为特征层加入训练。融合后的网络相对于SSD较为复杂,但基本保证实时性,且成功检测到大部分SSD网络漏检的小目标,检测精度相比于SSD模型也有提升。在选择框置信度得分阈值为0.3的情况下,基本检测到SSD漏检小目标。在VOC2007+2012中相对于SSD行人检测的Average Precision值从0.765提升为0.83。

关键词: 小目标行人检测, 多尺度预测, 特征融合, 反卷积神经网络, 深度学习

Abstract:

Given the problems of missing detection and detection failure for small targets in the single shot multibox detector (SSD), we propose an hourglass SSD model based on the idea of deconvolution and feature fusion, called hgSSD model. It deconvolutes the conventional SSD feature, which is then combined with shallower features to detect small target pedestrians in complex scenes. In order to preserve shallow network characteristics, ensure real-time detection and save computing resources, we use the VGG-16 instead of the deeper RestNet-101 as the basic network. In order to enhance the detection of small targets, Conv3_3 in VGG16 is improved as the feature layer added into the training. The fused network is more complex than the conventional SSD, but the real-time performance is basically guaranteed. It can successfully detect most of the small targets that are missed by the conventional SSD network, and the network has a higher accuracy than the conventional SSD model. In the case where the default box confidence threshold of 0.3, it basically detects the small targets undetected by the conventional SSD. In VOC  2007+2012, the pedestrian average precision value is increased from 0.765 to 0.83 in comparison with the conventional SSD.
 

Key words: small target pedestrian detection, multi-scale prediction, feature fusion, deconvolutional neural network, deep learning