• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (12): 2205-2214.

• 图形与图像 • 上一篇    下一篇

基于多尺度特征融合与背景抑制的MFFBSNet人群计数算法

赵佳彬1,2,徐慧英1,朱蓉2,3,4,陈滨2,5,王晓琳2,5,朱信忠1   

  1. (1.浙江师范大学计算机科学与技术学院(人工智能学院) ,浙江 金华 321004;
    2.嘉兴市智慧交通重点实验室,浙江 嘉兴 314001;3.嘉兴南湖学院信息工程学院,浙江 嘉兴 314001;
    4.嘉兴市智能计算与数据科学重点实验室,浙江 嘉兴 314001;5.嘉兴大学信息科学与工程学院,浙江 嘉兴 314001)

  • 收稿日期:2023-08-09 修回日期:2024-01-10 接受日期:2024-12-25 出版日期:2024-12-25 发布日期:2024-12-23
  • 基金资助:
    国家自然科学基金(62376252);浙江省自然科学基金(LZ22F030003)

A MFFBSNet crowd counting algorithm based on multi-scale feature fusion and background suppression

ZHAO Jia-bin1,2,XU Hui-ying1,ZHU Rong2,3,4,CHEN Bin2,5,WANG Xiao-Lin2,5 ,ZHU Xin-zhong1   

  1. (1.School of Computer Science and Technology(School of Artificial Intelligence),
    Zhejiang Normal University,Jinhua 321004;
    2.Jiaxing Key Laboratory of Smart Transportations,Jiaxing 314001;
    3.College of Information Engineering,Jiaxing Nanhu University,Jiaxing 314001;
    4.Jiaxing Key Laboratory of Intelligent Computation and Data Science,Jiaxing 314001;
    5.College of Information Science and Engineering,Jiaxing University,Jiaxing 314001,China)
  • Received:2023-08-09 Revised:2024-01-10 Accepted:2024-12-25 Online:2024-12-25 Published:2024-12-23

摘要: 针对复杂场景中的密集人群尺度变化、分布不均匀、背景遮挡等问题,提出一种基于多尺度特征融合与背景抑制的MFFBSNet人群计数算法。以视觉几何组网络VGG-16的前13层作为网络前端部分,引入空洞空间卷积池化金字塔(ASPP)和基于轻量级金字塔切分注意力机制(PSA)构建多尺度特征融合模块,以解决密集人群尺度变化问题;在网络的中间部分加入空间注意力机制以及通道注意力机制对特征图进行校准,突出图像人头区域;网络后端部分使用可加大感受野且不丢失图像分辨率的空洞卷积生成背景分割注意力图,抑制图像中背景噪声,提升人群分布密度图的质量。在ShanghaiTech、UCF_CC_50及NWPU-Crowd 3个公开数据集上的实验结果表明,相较于MCNN、SwitchCNN、CSRNet等算法,提出的基于MFFBSNet的人群计数算法的计数准确度较高。

关键词: 密集人群计数, 多尺度融合, 背景抑制, 密度图

Abstract: Aiming at the problems of scale variation, uneven distribution, and background occlusion of dense crowds in complex scenes, a crowd counting algorithm MFFBSNet based on multi-scale feature fusion and background suppression is proposed.The first 13 layers of the visual geometry group network VGG-16 are utilized as the front-end of the network. An atrous spatial pyramid pooling (ASPP) and a pyramid split attention (PSA) mechanism based on a lightweight design are introduced to construct a multi-scale feature fusion module, which addresses the problem of scale variation in dense crowds; In the middle of this network, spatial and channel attention mechanisms are incorporated to refine the feature maps, highlighting the head regions in the image;  The backend of this network employs atrous convolution, which enlarges the receptive field without losing image resolution, to generate a background segmentation attention map. This suppresses background noise in the image and enhances the quality of the crowd density map. Experimental results on three public datasets, namely ShanghaiTech, UCF_CC_50, and NWPU-Crowd,demonstrate that the proposed crowd counting algorithm based on the MFFBSNet achieves higher counting accuracy compared to methods such as MCNN,SwitchCNN,and CSRNet.


Key words: dense crowd counting, multi-scale fusion, background suppression, density map