• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (03): 495-503.

• 图形与图像 • 上一篇    下一篇

基于特征融合和注意力机制的图像语义分割

马冬梅,黄欣悦,李煜   

  1. (西北师范大学物理与电子工程学院,甘肃 兰州 730070)
  • 出版日期:2023-03-25 发布日期:2023-03-23
  • 基金资助:
    国家自然科学基金(61961037)

Image semantic segmentation based on feature fusion and attention mechanism

MA Dong-mei,HUANG Xin-yue,LI Yu   

  1. (School of Physics and Electronic Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Online:2023-03-25 Published:2023-03-23

摘要: 针对目前高精度语义分割模型需要大量计算资源,难以在硬件存储和计算力有限的嵌入式平台上部署,提出了一种基于特征融合和注意力机制的图像语义分割模型。首先,对基于DeepLabV3+的模型进行优化,采用通道剪枝对MobileNetV2骨干网络轻量化;然后,在轻量化后的模型中引入拆分三重注意力模块(STA)来提高特征图内部维度相关性;最后,在解码部分增加细粒度上采样模块完善边缘细节信息。在PASCAL VOC 2012和Cityscapes数据集上的实验中,本文模型的参数量仅为4.15×106,浮点计算量为10.23 GFLOPs,平均交并比分别为70.98%和72.26%,表明该模型在计算资源、内存占用和准确性之间达到了较好的均衡。

关键词: 图像处理, 语义分割, DeepLabV3+, 通道剪枝, 拆分三重注意力, 细粒度上采样

Abstract: The current high-precision semantic segmentation model requires huge computing resources, so it is difficult to deploy on embedded platforms with limited hardware storage and computing power. Aiming at this issue, an image semantic segmentation model based on feature fusion and attention mechanism is proposed. Firstly, the model based on DeepLabV3+ is optimized and the MobileNetV2 backbone network is lightened using channel pruning. Secondly, the Splittable Triplet Attention (STA) is introduced to the lightweight model to improve the internal dimensional correlation of the feature map. Finally, fine-grained up-sampling modules are added in the decoding part to improve the edge detail information. In the experiments on Pascal VOC 2012 and cityscapes datasets, the parameter number of the proposed algorithm is only 4.15×106, the number of floating-point operations is 10.23 GFLOPs, and the average intersection ratio is 70.98% and 72.26% respectively. The results show that the model achieves a good balance among computing resources, memory consumption and accuracy.

Key words: image processing, semantic segmentation, DeepLabV3+, channel pruning, splittable triplet attention, fine-grained upsampling