• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2022, Vol. 44 ›› Issue (02): 276-282.

• 图形与图像 • 上一篇    下一篇

基于混合注意力机制的目标跟踪算法

冯琪堯1,2,张惊雷1,2   

  1. (1.天津理工大学电气工程与自动化学院,天津 300384;2.天津市复杂系统控制理论及应用重点实验室,天津 300384)
  • 收稿日期:2020-09-21 修回日期:2020-12-17 接受日期:2022-02-25 出版日期:2022-02-25 发布日期:2022-02-17

An object tracking algorithm based on mixed attention mechanism

FENG Qi-yao1,2,ZHANG Jing-lei1,2   

  1. (1.School of Electrical Engineering and Automation,Tianjin University of Technology,Tianjin 300384;

    2.Tianjin Key Laboratory for Control Theory and Applications in Complicated Systems,Tianjin 300384,China)


  • Received:2020-09-21 Revised:2020-12-17 Accepted:2022-02-25 Online:2022-02-25 Published:2022-02-17

摘要: 针对全卷积孪生网络目标跟踪算法(SiamFC)在目标形变、遮挡和快速运动等复杂场景中易导致跟踪失败的问题,提出一种利用混合注意力机制增强网络识别能力的算法(SiamMA)。首先,在训练阶段提出堆叠裁剪法构建自对抗训练样本对,以模拟实际跟踪时的复杂场景,使训练的网络模型具有更强的泛化性。其次提出混合注意力机制,在网络不同分支融合使用空间注意力和通道注意力网络模块,有效抑制了特征图中的背景干扰,提升算法的鲁棒性。采用GOT-10k、UAV123等4种数据集进行算法性能评测实验,结果表明本文算法的跟踪成功率、精度等主要性能指标较SiamFC和KCF等6种经典算法均有提升,平均速度达到60 fps。

关键词: 目标跟踪, 孪生网络, 混合注意力机制, 自对抗训练样本对

Abstract: To solve the tracking failure problem of fully-convolutional siamese networks algorithm (SiamFC) in complex scenes such as objects deformation, occlusion, and fast motion, a novel method (SiamMA) that uses the mixed attention mechanism to enhance the network identification ability is proposed. Firstly, in order to simulate the complex scenes and enhances the generalization performance of networks, an image stacking and cropping method is adopted in the network training stage to build the self-adversarial training sample pairs. Secondly, a mixed attention mechanism algorithm is proposed, which fuses spatial attention and channel attention modules in different branches of the network, so the background interference in the feature map can effectively be suppressed and the robustness of the algorithm is improved. 4 open test datasets such as Got-10k and UAV123, etc., are adopted to evaluate the algorithm performance. The experimental results show that our method outperforms 6 traditional algorithms such as SiamFC, KCF, etc., on the main performance indexes such as tracking success rate and precision. The average speed of the algorithm can reach 60 frames per second.

Key words: object tracking, siamese network, mixed attention mechanism, self-adversarial training sample pairs