• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (5): 876-887.doi: 10.3969/j.issn.1007-130X.2026.05.011

• 图形与图像 • 上一篇    下一篇

面向人员密集与遮挡场景的实时目标检测方法

盛伟,刘明剑,刘殿臣   

  1. (1.大连海洋大学信息工程学院,辽宁 大连  116023;2.大连海洋大学设施渔业教育部重点实验室,辽宁 大连  116023)

  • 收稿日期:2024-09-09 修回日期:2024-10-14 出版日期:2026-05-25 发布日期:2026-05-21
  • 基金资助:
    辽宁省科技计划(2025-MSLH-120);辽宁省教育厅基本科研项目(LJ212410158018);辽宁省属本科高校基本科研业务费专项资金(2024JBQNZ007)


A real-time object detection method for crowded and occluded scenes

SHENG Wei,LIU Mingjian,LIU Dianchen   

  1. (1.School of Information Engineering,Dalian Ocean University,Dalian  116023;
    2.Key Laboratory of Environment Controlled Aquaculture,Ministry of Education,
    Dalian Ocean University,Dalian  116023,China)
  • Received:2024-09-09 Revised:2024-10-14 Online:2026-05-25 Published:2026-05-21

摘要: 人员密集场景的目标检测在实时系统中至关重要,但面临硬件资源有限和遮挡问题,导致检测延迟和精度下降。提出了一种遮挡感知轻量级目标检测方法,包括主干、特征融合和输出预测3部分。该方法使用快速网络块提取特征,并通过位置注意力机制关注遮挡边界。主干部分的特征金字塔串联汇聚模块减少信息丢失,提高对不同尺度和遮挡人员的识别能力。特征融合部分采用分组洗牌卷积,优化特征流动而不增加计算负担。输出预测部分使用任务对齐单阶段目标检测方法,提升遮挡条件下的识别准确性。实验结果显示,所提方法在WiderPerson数据集上的召回率达66.8%,比YOLOv8-n高2.0个百分点,且模型参数量仅1.8×106,运行效率优于其他模型。在UpDown数据集上,分类错误率和未检测目标错误率分别为2.6%和1.3%,分别比YOLOv8-n的低了0.4个百分点和0.7个百分点。实验验证了该方法在资源有限设备中的高效性。


关键词: 人员密集检测, 人员行为遮挡检测, 资源受限计算, 类间遮挡和类内遮挡, 增强位置注意力机制, 特征金字塔串联汇聚模块

Abstract: Object detection in crowded scenarios is crucial in real-time systems, but it faces chal- lenges such as limited hardware resources and occlusion issues, leading to detection delays and reduced accuracy. This paper proposes an occlusion-aware lightweight object detection method (OLODN) comprising 3 parts: a backbone, feature fusion, and output prediction. The method employs fast network blocks for feature extraction and utilizes a positional attention mechanism to focus on occlusion boundaries. The spatial pyramid pooling feature concatenation module in the backbone reduces information loss and enhances the ability to recognize individuals of varying scales and occlusions. The feature fusion section adopts grouped shuffle convolution to optimize feature flow without increasing computational overhead. The output prediction section employs a task-aligned single-stage object detection method to improve recognition accuracy under occlusion conditions. Experimental results show that the method achieves  66.8% recall on the WiderPerson dataset, which is 2.0 percentage points higher than that of YOLOv8-n, with only 1.8×106 model parameters and superior operational efficiency compared to other models. On the Up-Down dataset, the classification error rate and undetected object error rate are 2.6% and 1.3%, respectively, which are 0.4 percentage points and 0.7 percentage points lower than YOLOv8-n. The experiments validate the methods efficiency on resource-constrained devices.

Key words: crowd detection, occlusion detection in human behavior, resource-constrained computing, inter-class and intra-class occlusion, reinforced coordinate attention mechanism, spatial pyramid pooling feature concatenation module