• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2024, Vol. 46 ›› Issue (08): 1503-1512.

• 人工智能与数据挖掘 • 上一篇    下一篇

一种基于注意力机制的轻量级语义分割

马冬梅,王鹏宇,郭智浩   

  1. (西北师范大学物理与电子工程学院,甘肃 兰州 730070)
  • 收稿日期:2023-03-24 修回日期:2023-09-12 接受日期:2024-08-25 出版日期:2024-08-25 发布日期:2024-09-02
  • 基金资助:
    国家自然科学基金(61961037)

A lightweight semantic segmentation based on attention mechanism

MA Dong-mei,WANG Peng-yu,GUO Zhi-hao   

  1. (School of Physics & Electronic Engineering,Northwest Normal University,Lanzhou 730070,China)
  • Received:2023-03-24 Revised:2023-09-12 Accepted:2024-08-25 Online:2024-08-25 Published:2024-09-02

摘要: 语义分割是一种计算机视觉技术,它需要从大量的图像中提取出重点信息,然后通过掩膜的方式,将这些信息转化成更加清晰、易于理解的表达形式。研究人员正在努力寻求一种平衡,在保证模型精度的同时,尽可能减小模型的体积,这也是当前设计轻量级网络模型的热门话题。当前,图像语义分割技术存在许多挑战,如分割不连续、错误分割和模型复杂度过高。为了解决这些问题,提出了一种基于注意力机制的轻量级语义分割模型。该模型采用冻结解冻训练,特征提取网络是MobileNetV2,为了恢复较清晰的目标边界,在空洞金字塔池化(ASPP)输出部分引入轻量级的卷积注意力(CBAM)模块或在解码部分引入通道注意力(ECA-Net);为了解决样本不均衡的问题,引入了focal_loss损失函数;使用了混合精度和替换了输出端的标准卷积——DO-Conv卷积,在PASCAL VOC 2012和Cityscapes数据集上进行实验和验证,模型的大小为23.6 MB,平均交并比分别为73.91%和74.89%,类别平均像素准确率分别82.88%和84.87%,成功地在精确分割和计算效率之间取得了平衡。


关键词: 语义分割, DeepLabV3+, MobileNetV2, CBAM, 通道注意力

Abstract: Semantic segmentation is a computer vision technique that requires extracting focused information from a large number of images and then transforming this information into a clearer and easier- to-understand representation by means of a mask. Researchers are trying to find a balance in order to minimize the size of the model while ensuring its accuracy, which is currently a hot topic in designing lightweight network models. Currently, there are many challenges in image semantic segmentation techniques, such as segmentation discontinuity, incorrect segmentation, and high model complexity. To solve these problems, a lightweight semantic segmentation model based on attention mechanism is proposed. It uses freeze-thaw training, and the feature extraction network is MobileNetV2. To recover clearer target boundaries, a lightweight convolutional attention (CBAM) module is introduced in the output part of the atrous spatial pyramid pooling (ASPP) or channel attention (ECA-Net) in the decod- ing part. To solve the sample imbalance problem, the focal_loss loss function is introduced. Mixed accuracy is used, and the standard convolution in the output section is replaced with DO-Conv convolution. Experiments and validations are conducted on the PASCAL VOC2012 and Cityscapes datasets. The model size is 23.6 MB, with mean intersection over union (mIoU) scores of 73.91% and 74.89%, and class-wise pixel accuracy scores of 82.88% and 84.87% respectively. This successfully achieves a balance between accurate segmentation and computational efficiency.


Key words: semantic segmentation;DeepLabV3+;MobileNetV2;CBAM;channel , attention