结合混合注意力与多尺度特征的人体姿态估计

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (3): 531-539.

结合混合注意力与多尺度特征的人体姿态估计

谷学静，栗燕茹，杨蓝潇

（1.华北理工大学电气工程学院，河北唐山 063210；2.唐山市数字媒体工程技术研究中心，河北唐山 063000；
3.华北理工大学人工智能学院，河北唐山 063210）

收稿日期:2024-07-14 修回日期:2024-09-09 出版日期:2026-03-25 发布日期:2026-03-25
基金资助:
唐山市科技创新团队培养计划（18130221A）

Human pose estimation combining mixed attention and multi-scale feature

GU Xuejing,LI Yanru,YANG Lanxiao

(1.College of Electrical Engineering,North China University of Science and Technology,Tangshan 063210;
2.Tangshan Digital Media Engineering Technology Research Center,Tangshan 063000;
3.College of Artificial Intelligence,North China University of Science and Technology,Tangshan 063210，China)

Received:2024-07-14 Revised:2024-09-09 Online:2026-03-25 Published:2026-03-25

摘要/Abstract

摘要： 针对遮挡场景下多人姿态估计准确率低的问题，提出了一种结合混合注意力机制和多尺度序列特征的人体姿态估计模型DAW-YOLOPose。首先，采用MLCA注意力机制改进YOLOv8Pose的主干网络，在不增加模型参数量的同时有效捕获并传递空间和通道信息，以提升网络的特征表达效果。其次，提出了一种全新的多尺度序列特征融合网络，增强多尺度特征信息提取能力，同时融合不同尺度的特征映射。最后，使用Wise-IoU v3损失函数的梯度增益分配策略，提高对高质量锚框的区分能力，减少低质量样本对模型训练的负面影响。在MSCOCO数据集上的实验结果表明，DAW-YOLOPose与YOLOv8Pose相比，mAP@0.5,mAP@0.5:0.95和召回率分别提升2.7个百分点、1.4个百分点和1.9个百分点，实现了更优越的姿态估计效果。

关键词:

YOLOPose, 人体姿态估计, 注意力机制, 多尺度序列特征, 损失函数

Abstract: To solve the problem of low accuracy of multi-person pose estimation in occlusion scenes, a human pose estimation model named DAW-YOLOPose, which combines mixed attention mechanism and multi-scale sequence feature is proposed. Firstly, the mixed local channel attention (MLCA) mechanism is used to improve the backbone network of YOLOv8Pose, effectively capturing and transmitting spatial and channel information without increasing the number of model parameters, so as to improve the feature expression effect of the network. Secondly, a new multi-scale sequence feature fusion network is proposed to enhance the extraction ability of multi-scale feature information and integrate feature maps of different scales. Finally, the gradient gain allocation strategy of Wise-IoU v3 loss function is used to improve the ability to distinguish high-quality anchor frames and reduce the negative impact of low-quality samples on model training. The experimental results on MSCOCO dataset show that, compared with YOLOv8Pose, DAW-YOLOPose improves the mAP@0.5, mAP@0.5:0.95 and recall by 2.7 percentage points，1.4 percentage points and 1.9 percentage points respectively, achieving a better estimation effect.

Key words: YOLOPose, human pose estimation, attention mechanism, multi-scale sequence feature, loss function

谷学静, 栗燕茹, 杨蓝潇. 结合混合注意力与多尺度特征的人体姿态估计[J]. 计算机工程与科学, 2026, 48(3): 531-539.

GU Xuejing, LI Yanru, YANG Lanxiao. Human pose estimation combining mixed attention and multi-scale feature[J]. Computer Engineering & Science, 2026, 48(3): 531-539.

[1]	蒋建伟, 贾小云, 段克盼, 郭宇, 盛良浩, 魏联婷. 基于改进YOLOv8的道路障碍物检测模型[J]. 计算机工程与科学, 2026, 48(3): 561-570.
[2]	徐广平, 徐慧英, 朱信忠, 黄晓, 王舒梦, 宋杰. 基于改进YOLOv8的低光行人检测算法[J]. 计算机工程与科学, 2026, 48(3): 540-550.
[3]	童立靖, 英溢卓, 曹楠. 一种融合语义图卷积与自注意力机制的三维人体姿态估计方法[J]. 计算机工程与科学, 2026, 48(3): 521-530.
[4]	陆顺意, 何庆. 一种基于预训练语言模型的多特征融合文章对匹配模型[J]. 计算机工程与科学, 2026, 48(2): 363-371.
[5]	王静, 马慧芳, 张梦媛. 基于知识点会话感知的知识追踪方法[J]. 计算机工程与科学, 2026, 48(1): 180-190.
[6]	李志鹏1, 陈丹阳1, 2, 钟诚1, 2. 一种适合大面积破损图像的多重修复网络[J]. 计算机工程与科学, 2025, 47(9): 1638-1646.
[7]	吐尔地·托合提1, 2, 罗长虹1, 2, 艾斯卡尔·艾木都拉1, 2. 文本问答中基于双向叠加注意力的证据区间预测[J]. 计算机工程与科学, 2025, 47(8): 1470-1482.
[8]	刘畅, 徐炜遐. CNN-ViTAMR：一种基于Transformer的自动信号调制识别算法及其轻量化实现#br#[J]. 计算机工程与科学, 2025, 47(8): 1408-1416.
[9]	张凤1, 邵玉斌1, 杜庆治1, 龙华1, 马迪南2. 基于双通道图卷积网络的多模态方面级情感分析[J]. 计算机工程与科学, 2025, 47(7): 1321-1330.
[10]	林毅1, 2, 3, 宋慧慧1, 2, 3. 用于全色锐化的金字塔特征解耦提取融合网络[J]. 计算机工程与科学, 2025, 47(7): 1262-1273.
[11]	李莉, 张晴, 孔悠然, 苏仁嘉, 赵鑫. 基于生成对抗网络的恶意代码变体家族溯源方法[J]. 计算机工程与科学, 2025, 47(7): 1215-1225.
[12]	陈俊彦1, 李欣梅1, 朱昌洪2, 肖微3. 基于多视图图注意力机制的软件定义光传输网络路由优化算法[J]. 计算机工程与科学, 2025, 47(7): 1193-1204.
[13]	敬容1, 万福成1, 2, 黄锐1, 于洪志1, 2, 马宁1, 2. 融合降噪微调与图注意力机制的藏文长文本分类[J]. 计算机工程与科学, 2025, 47(6): 1133-1140.
[14]	周丰峻, 康怀强, 高伸, 李锋, 孙云厚, 高航, 马芃晟. 基于改进的YOLOv8模型对地下工程混凝土裂纹的检测识别[J]. 计算机工程与科学, 2025, 47(6): 1079-1089.
[15]	王莹, 杨青, 王翔宇, 张勇, . 基于非对称空间特征的脑电信号情感分析研究[J]. 计算机工程与科学, 2025, 47(5): 921-930.