• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (1): 108-118.

• 图形与图像 • 上一篇    下一篇

用于土地覆盖分割的多路径多尺度注意力网络

李燕,樊新宇,陈芹


  

  1. (南京信息工程大学自动化学院,江苏 南京 210044)

  • 收稿日期:2024-05-08 修回日期:2024-06-28 出版日期:2026-01-25 发布日期:2026-01-25
  • 基金资助:
    国家自然科学基金(42305158)

A multi-path and multi-scale attention network for land cover segmentation

LI Yan,FAN Xinyu,CHEN Qin   

  1. (School of Automation,Nanjing University of Information Science & Technology,Nanjing 210044,China)
  • Received:2024-05-08 Revised:2024-06-28 Online:2026-01-25 Published:2026-01-25

摘要: 近年来,Transformer及其变种在图像识别领域已取得显著进展,但其在像素级分割任务中仍面临挑战,主要原因在于它们对局部偏差的处理不够显式和有效。对此,提出了一种名为DMANet的多路径多尺度注意力网络。该网络在编码阶段结合了卷积神经网络和Transformer的优势,能够同时捕获图像的精细局部信息和广泛的全局上下文信息,有效地提升特征提取能力。提出的交互式双分支结构加强了对特征的整合能力,提高网络模型在密集预测任务中的性能。在解码阶段实施跨层特征融合,增强DMANet对复杂目标的识别能力。通过在Potsdam,GID-15和L8 SPARCS数据集上进行测试,DMANet展示了其在复杂土地覆盖分割任务中的优异性能及广泛适用性。


关键词: Transformer结构, 语义分割, 多路径多尺度, 卷积神经网络, 土地覆盖

Abstract: In recent years, Transformers have made remarkable progress in the field of image recognition, yet they still face challenges in pixel-level segmentation tasks, primarily due to their insufficiently explicit and effective handling of local deviations. To address this issue, this paper proposes a multi-path and multi-scale attention network, named DMANet. By integrating the strengths of convolutional neural network (CNN) and Transformers during the encoding phase, this network is capable of simultaneously capturing fine-grained local information and extensive global context from images, effectively enhancing feature extraction capabilities. The proposed interactive dual-branch structure enhances feature integration, improving the model's performance in dense prediction tasks. During the decoding phase, cross-layer feature fusion is implemented to enhance DMANet’s ability to recognize complex objects. DMANet has demonstrated its exceptional performance and broad applicability in complex land cover segmentation tasks through experiments on Potsdam, GID-15, and L8 SPARCS datasets.


Key words: Transformer structure, semantic segmentation, multi-path and multi-scale, convolutional neural network, land cover