• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (3): 531-539.

• 图形与图像 • 上一篇    下一篇

结合混合注意力与多尺度特征的人体姿态估计

谷学静,栗燕茹,杨蓝潇   

  1. (1.华北理工大学电气工程学院,河北 唐山 063210;2.唐山市数字媒体工程技术研究中心,河北 唐山 063000;
    3.华北理工大学人工智能学院,河北 唐山 063210)

  • 收稿日期:2024-07-14 修回日期:2024-09-09 出版日期:2026-03-25 发布日期:2026-03-25
  • 基金资助:
    唐山市科技创新团队培养计划(18130221A)

Human pose estimation combining mixed attention and multi-scale feature

GU Xuejing,LI Yanru,YANG Lanxiao   

  1. (1.College of Electrical Engineering,North China University of Science and Technology,Tangshan 063210;
    2.Tangshan Digital Media Engineering Technology Research Center,Tangshan 063000;
    3.College of Artificial Intelligence,North China University of Science and Technology,Tangshan 063210,China)
  • Received:2024-07-14 Revised:2024-09-09 Online:2026-03-25 Published:2026-03-25

摘要: 针对遮挡场景下多人姿态估计准确率低的问题,提出了一种结合混合注意力机制和多尺度序列特征的人体姿态估计模型DAW-YOLOPose。首先,采用MLCA注意力机制改进YOLOv8Pose的主干网络,在不增加模型参数量的同时有效捕获并传递空间和通道信息,以提升网络的特征表达效果。其次,提出了一种全新的多尺度序列特征融合网络,增强多尺度特征信息提取能力,同时融合不同尺度的特征映射。最后,使用Wise-IoU v3损失函数的梯度增益分配策略,提高对高质量锚框的区分能力,减少低质量样本对模型训练的负面影响。在MSCOCO数据集上的实验结果表明,DAW-YOLOPose与YOLOv8Pose相比,mAP@0.5,mAP@0.5:0.95和召回率分别提升2.7个百分点、1.4个百分点和1.9个百分点,实现了更优越的姿态估计效果。



关键词:

Abstract: To solve the problem of low accuracy of multi-person pose estimation in occlusion scenes, a human pose estimation model named DAW-YOLOPose, which combines mixed attention mechanism and multi-scale sequence feature is proposed. Firstly, the mixed local channel attention (MLCA) mechanism is used to improve the backbone network of YOLOv8Pose, effectively capturing and transmitting spatial and channel information without increasing the number of model parameters, so as to improve the feature expression effect of the network. Secondly, a new multi-scale sequence feature fusion network is proposed to enhance the extraction ability of multi-scale feature information and integrate feature maps of different scales. Finally, the gradient gain allocation strategy of Wise-IoU v3 loss function is used to improve the ability to distinguish high-quality anchor frames and reduce the negative impact of low-quality samples on model training. The experimental results on MSCOCO dataset show that, compared with YOLOv8Pose, DAW-YOLOPose improves the  mAP@0.5, mAP@0.5:0.95 and recall by 2.7 percentage points,1.4 percentage points and 1.9 percentage points respectively, achieving a better  estimation effect.

Key words: YOLOPose, human pose estimation, attention mechanism, multi-scale sequence feature, loss function