• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (02): 253-263.

• 图形与图像 • 上一篇    下一篇

基于动态定位和特征融合的多分支细粒度识别方法

杨晓强,黄加诚   

  1. (西安科技大学计算机科学与技术学院,陕西 西安 710000)

  • 收稿日期:2022-12-05 修回日期:2023-02-26 接受日期:2024-02-25 出版日期:2024-02-25 发布日期:2024-02-24
  • 基金资助:
    国家自然科学基金(62002285)

A multi-branch fine-grained recognition method based on dynamic localization and feature fusion

YANG Xiao-qiang,HUANG Jia-cheng   

  1. (College  of Computer Science & Technology,Xi’an University of Science and Technology,Xi’an 710000,China)
  • Received:2022-12-05 Revised:2023-02-26 Accepted:2024-02-25 Online:2024-02-25 Published:2024-02-24

摘要: 为了解决细粒度分类类间差异小、类内差异大的分类难点,在Swin Transformer基础上,提出了一种改进的端到端的细粒度分类模型(TBformer)。针对复杂背景对网络识别产生的干扰,使用ECA、Resnet50、SCDA相结合的动态定位模块(DLModule)捕获关键物体,并设计了基于DLModule的三分支特征提取模块,提高对目标判别性特征的提取能力。为了充分挖掘三分支特征蕴含的丰富细粒度信息,提出了基于ECA的特征融合方法,增强特征的全面性、精确性,提高网络对细粒度分类的鲁棒性。实验结果表明:相比基础方法,TBformer在CUB-200-2011上的准确率提升了3.19%,在Stanford Dogs上的准确率提升了3.47%,在NABirds上的准确率提升了1.09%。

关键词: 细粒度识别, 特征融合, 注意力机制, 多分支

Abstract: To solve the classification difficulties of small inter-class differences and large intra-class differences in fine-grained classification, an improved end-to-end fine-grained classification model (TBformer) is proposed based on Swin Transformer. In view of the interference of complex background on network recognition, the dynamic location module (DLModule) combining ECA, Resnet50 and SCDA is used to capture key objects, and a three-branch feature extraction module based on DLModule is designed to improve the ability of target discriminant feature extraction. In order to fully tap the rich fine-grained information contained in the three-branch features, a feature fusion method based on ECA is proposed to enhance the comprehensiveness and accuracy of the features, and improve the robustness of the network for fine-grained classification. The experimental results show that compared with the basic method, the accuracy of TBformer is improved by 3.19% in CUB-200-2011, 3.47% in Stanford Dogs and 1.09% in NABirds.

Key words: fine grained recognition, feature fusion, attention mechanism, multiple branches