Human pose estimation combining mixed attention and multi-scale feature

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (3): 531-539.

• Graphics and Images • Previous Articles Next Articles

Human pose estimation combining mixed attention and multi-scale feature

GU Xuejing,LI Yanru,YANG Lanxiao

(1.College of Electrical Engineering,North China University of Science and Technology,Tangshan 063210;
2.Tangshan Digital Media Engineering Technology Research Center,Tangshan 063000;
3.College of Artificial Intelligence,North China University of Science and Technology,Tangshan 063210，China)

Received:2024-07-14 Revised:2024-09-09 Online:2026-03-25 Published:2026-03-25

Abstract

Abstract: To solve the problem of low accuracy of multi-person pose estimation in occlusion scenes, a human pose estimation model named DAW-YOLOPose, which combines mixed attention mechanism and multi-scale sequence feature is proposed. Firstly, the mixed local channel attention (MLCA) mechanism is used to improve the backbone network of YOLOv8Pose, effectively capturing and transmitting spatial and channel information without increasing the number of model parameters, so as to improve the feature expression effect of the network. Secondly, a new multi-scale sequence feature fusion network is proposed to enhance the extraction ability of multi-scale feature information and integrate feature maps of different scales. Finally, the gradient gain allocation strategy of Wise-IoU v3 loss function is used to improve the ability to distinguish high-quality anchor frames and reduce the negative impact of low-quality samples on model training. The experimental results on MSCOCO dataset show that, compared with YOLOv8Pose, DAW-YOLOPose improves the mAP@0.5, mAP@0.5:0.95 and recall by 2.7 percentage points，1.4 percentage points and 1.9 percentage points respectively, achieving a better estimation effect.

Key words: YOLOPose, human pose estimation, attention mechanism, multi-scale sequence feature, loss function

GU Xuejing, LI Yanru, YANG Lanxiao. Human pose estimation combining mixed attention and multi-scale feature[J]. Computer Engineering & Science, 2026, 48(3): 531-539.

[1]	WEN Tao, WANG Tianyi, HUANG Shirui, ZHOU Jianglong. An improved YOLOv8-based model for crop and pigweed detection:MES-YOLO [J]. Computer Engineering & Science, 2026, 48(3): 434-443.
[2]	JIANG Jianwei, JIA Xiaoyun, DUAN Kepan, GUO Yu, SHENG Lianghao, WEI Lianting. A road obstacle detection model based on improved YOLOv8 [J]. Computer Engineering & Science, 2026, 48(3): 561-570.
[3]	XU Guangping, XU Huiying, ZHU Xinzhong, HUANG Xiao, WANG Shumeng, SONG Jie. An improved low-light pedestrian detection algorithm based on YOLOv8 [J]. Computer Engineering & Science, 2026, 48(3): 540-550.
[4]	TONG Lijing, YING Yizhuo, CAO Nan. A 3D human pose estimation method integrating semantic graph convolutional network and self-attention mechanism [J]. Computer Engineering & Science, 2026, 48(3): 521-530.
[5]	LU Shunyi, HE Qing. Article pair matching model based on multi-feature fusion of pre-trained language models [J]. Computer Engineering & Science, 2026, 48(2): 363-371.
[6]	WANG Jing, MA Huifang, ZHANG Mengyuan. Knowledge concept-aware session modeling for knowledge tracing [J]. Computer Engineering & Science, 2026, 48(1): 180-190.
[7]	XIAN Ling, XU Xiuyuan, ZHOU Kai, NIU Hao, GUO Jixiang. A pulmonary airway CT image segmentation method based on a novel adaptive combined loss function [J]. Computer Engineering & Science, 2026, 48(1): 119-132.
[8]	LI Zhipeng1, CHEN Danyang1, 2, ZHONG Cheng1, 2. A multiple restoration network for large broken images [J]. Computer Engineering & Science, 2025, 47(9): 1638-1646.
[9]	TURDI Tohti1, 2, LUO Changhong1, 2, ASKAR Hamdulla1, 2. Evidence span prediction based on bidirectional superposition attention in DBQA [J]. Computer Engineering & Science, 2025, 47(8): 1470-1482.
[10]	LIU Chang, XU Weixia. CNN-ViTAMR:A Transformer-based automatic modulation recognition algorithm and its light-weighted implementation [J]. Computer Engineering & Science, 2025, 47(8): 1408-1416.
[11]	ZHANG Feng1, SHAO Yubin1, DU Qingzhi1, LONG Hua1, MA Dinan2. Multimodal aspect-based sentiment analysis based on dual channel graph convolutional network [J]. Computer Engineering & Science, 2025, 47(7): 1321-1330.
[12]	LIN Yi1, 2, 3, SONG Huihui1, 2, 3. A pyramid feature decoupling extraction fusion network for pansharpening [J]. Computer Engineering & Science, 2025, 47(7): 1262-1273.
[13]	LI Li, ZHANG Qing, KONG Youran, SU Renjia, ZHAO Xin. A malicious code variant families tracing method based on generative adversarial network [J]. Computer Engineering & Science, 2025, 47(7): 1215-1225.
[14]	CHEN Junyan1, LI Xinmei1, ZHU Changhong2, XIAO Wei3. A routing optimization algorithm for software-defined optical transport network based on multi-view graph attention mechanism [J]. Computer Engineering & Science, 2025, 47(7): 1193-1204.
[15]	JING Rong1, WAN Fucheng1, 2, HUANG Rui1, YU Hongzhi1, 2, MA Ning1, 2. Tibetan long text classification by fusing denoising fine-tuning and graph attention mechanism [J]. Computer Engineering & Science, 2025, 47(6): 1133-1140.

Human pose estimation combining mixed attention and multi-scale feature

PDF

Knowledge

Abstract

Cite this article

share this article

Related Articles 15

Recommended Articles

Metrics

Comments