• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (3): 531-539.

• Graphics and Images • Previous Articles     Next Articles

Human pose estimation combining mixed attention and multi-scale feature

GU Xuejing,LI Yanru,YANG Lanxiao   

  1. (1.College of Electrical Engineering,North China University of Science and Technology,Tangshan 063210;
    2.Tangshan Digital Media Engineering Technology Research Center,Tangshan 063000;
    3.College of Artificial Intelligence,North China University of Science and Technology,Tangshan 063210,China)
  • Received:2024-07-14 Revised:2024-09-09 Online:2026-03-25 Published:2026-03-25

Abstract: To solve the problem of low accuracy of multi-person pose estimation in occlusion scenes, a human pose estimation model named DAW-YOLOPose, which combines mixed attention mechanism and multi-scale sequence feature is proposed. Firstly, the mixed local channel attention (MLCA) mechanism is used to improve the backbone network of YOLOv8Pose, effectively capturing and transmitting spatial and channel information without increasing the number of model parameters, so as to improve the feature expression effect of the network. Secondly, a new multi-scale sequence feature fusion network is proposed to enhance the extraction ability of multi-scale feature information and integrate feature maps of different scales. Finally, the gradient gain allocation strategy of Wise-IoU v3 loss function is used to improve the ability to distinguish high-quality anchor frames and reduce the negative impact of low-quality samples on model training. The experimental results on MSCOCO dataset show that, compared with YOLOv8Pose, DAW-YOLOPose improves the  mAP@0.5, mAP@0.5:0.95 and recall by 2.7 percentage points,1.4 percentage points and 1.9 percentage points respectively, achieving a better  estimation effect.

Key words: YOLOPose, human pose estimation, attention mechanism, multi-scale sequence feature, loss function