• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (5): 898-905.doi: 10.3969/j.issn.1007-130X.2026.05.013

• 图形与图像 • 上一篇    下一篇

基于注意力机制的特征融合语义分割模型

马冬梅,朱启荣,吕雪龙


  

  1. (西北师范大学物理与电子工程学院,甘肃 兰州 730070)
  • 收稿日期:2024-06-27 修回日期:2025-01-06 出版日期:2026-05-25 发布日期:2026-05-21
  • 基金资助:
    国家自然科学基金(61961037)

A feature fusion semantic segmentation model based on attention mechanism

MA Dongmei,ZHU Qirong,Lv Xuelong   

  1. (College of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2024-06-27 Revised:2025-01-06 Online:2026-05-25 Published:2026-05-21

摘要: 针对现有的语义分割模型DeepLabV3+容易出现误分割、分割精度低以及细节信息丢失严重等问题,提出了一种基于注意力机制的融合语义分割模型。首先,在该模型中的空洞卷积分支级联一个可切换空洞卷积,使其更加灵活地适应不同尺度的特征,减少误分割现象;其次,引入RFEM模块,捕获浅层特征多尺度信息以及不同范围的依赖关系,提高模型的性能;再次,提取模型的中间层特征,并利用ELAFF模块与其深层特征融合,使模型恢复在下采样过程中丢失的细节信息;最后,添加高效局部注意力,使模型更加关注图像信息,减少背景干扰。在PASCAL VOC 2012数据集上的实验结果表明,相比原模型,所提模型的平均交并比提升2.36个百分点,平均像素准确度提升1.60个百分点,可有效改善模型的分割性能。


关键词: 注意力机制, 语义分割, DeepLabV3+, 特征融合

Abstract: To address the issues of mis-segmentation, low segmentation accuracy, and severe loss of detailed information commonly encountered in the existing DeepLabV3+ semantic segmentation model, a feature-fusion semantic segmentation model based on an attention mechanism is proposed. Firstly, a switchable atrous convolution is cascaded within the dilated convolution branch of the model, enabling it to adapt more flexibly to features at different scales and thereby reducing mis-segmentation. Additionally, an RFEM module is introduced to capture multi-scale information from shallow features and depen- dencies across different ranges, enhancing the model’s performance. Furthermore, intermediate-layer features of the model are extracted and fused with its deep features using the ELAFF module, enabling the model to recover detailed information lost during the downsampling process. Finally, an efficient local attention mechanism is added to make the model focus more on image information and reduce background interference. Experimental results on the PASCAL VOC 2012 dataset demonstrate that, compared to the original model, the proposed model achieves a 2.36 percentage points increase in mean intersection-over-union (mIoU) and a 1.60 percentage points improvement in mean pixel accuracy (MPA), effectively enhancing the model’s segmentation performance.

Key words: attention mechanism, semantic segmentation, DeepLabV3+, feature fusion