• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (05): 852-860.

• 图形与图像 • 上一篇    下一篇

改进YOLOv5的多人姿态估计修正算法

赵金源,贾迪   

  1. (辽宁工程技术大学电子与信息工程学院,辽宁 葫芦岛 125100)
  • 收稿日期:2023-04-07 修回日期:2023-09-19 接受日期:2024-05-25 出版日期:2024-05-25 发布日期:2024-05-30
  • 基金资助:
    国家自然科学基金(61601213);辽宁省教育厅项目(LJ2020FWL004,2019-ZD-0038)

A multi-person pose estimation correction algorithm based on improved YOLOv5

ZHAO Jin-yuan,JIA Di   

  1. (School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125100,China)
  • Received:2023-04-07 Revised:2023-09-19 Accepted:2024-05-25 Online:2024-05-25 Published:2024-05-30

摘要: 由于拥挤场景中的多人姿态估计仍受检测目标较小等问题的影响,导致姿态估计准确率低,为此提出一种改进YOLOv5的多人姿态估计修正算法。首先,在YOLOv5的骨干网络中,融入跳跃注意力模块,帮助网络在图像中找到感兴趣区域;其次,在颈部网络中,利用加权双向特征金字塔提高网络对不同尺度特征图间的特征融合能力,并联合使用跳跃注意力模块与Transformer编码器,使网络获取全局信息和丰富的上下文信息;再次,在检测部分增加一个检测头,使网络对微小目标更加敏感;最后,利用网络预测得到的关键点对象信息修正姿态对象信息得到最终的多人姿态估计结果。实验结果表明,本文算法较YOLOv5在COCO数据集上AP50提高了2.2%,AP75提高了3.3%,验证了本文算法的精确性和鲁棒性。

关键词: 人体姿态估计, 跳跃注意力机制, 加权特征金字塔, Transformer编码器, 目标检测

Abstract: Since the multi-person pose estimation in crowded scenes is still affected by the problems of small detection objects, resulting in low accuracy of pose estimation, this paper proposes a multi- person pose estimation correction algorithm based on improved YOLOv5. Firstly, in the backbone network of YOLOv5, a jump attention module is integrated to help the network find the region of interest in the image. Secondly, in the neck network, the weighted bidirectional feature pyramid is used to improve the feature fusion ability between feature maps of different scales, and the jump attention module and Transformer encoder are used jointly to enable the network to obtain global information and rich context information. Thirdly, a detection head is added to the detection part to make the network more sensitive to tiny objects. Finally, the key point object information obtained by network prediction is used to modify the attitude object information to obtain the final multi-person pose estimation result. Experimental results show that the proposed algorithm improves YOLOv5s AP50 by 2.2% and AP75 by 3.3% on the COCO dataset, validating the accuracy and robustness of this algorithm.

Key words: person pose estimation, jump attention mechanism, weighted feature pyramid, Transformer encoder, object detection