• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2021, Vol. 43 ›› Issue (04): 712-720.

• 图形与图像 • 上一篇    下一篇

基于路径聚合扩张卷积的图像语义分割方法

李叔敖,解庆,马艳春,刘永坚   

  1. (武汉理工大学计算机科学与技术学院,湖北 武汉 430070)
  • 收稿日期:2020-01-15 修回日期:2020-06-18 接受日期:2021-04-25 出版日期:2021-04-25 发布日期:2021-04-21
  • 基金资助:
    国家自然科学基金(61602353)

An image semantic segmentation method based on path aggregation Atrous convolutional network

LI Shu-ao,XIE Qing,MA Yan-chun,LIU Yong-jian   

  1. (School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430070,China)
  • Received:2020-01-15 Revised:2020-06-18 Accepted:2021-04-25 Online:2021-04-25 Published:2021-04-21

摘要: 基于编码器-解码器的深度全卷积神经网络在图像语义分割中取得了重大的进展,但是深度网络中网络低层定位信息传播到网络高层路径过长,导致解码阶段难以利用低层定位信息来恢复物体边界结构,针对这一问题,提出了一种应用在分割网络解码器部分的路径聚合结构。该结构缩短了分割网络中低层信息到高层信息的传播路径并提供多尺度的上下文语义信息,使得分割网络能产生更为精细的边界分割结果。针对语义分割中常使用的Softmax交叉熵损失函数对外观相似样本区分能力不足的问题,对Softmax交叉熵损失函数进行改造,提出了双向交叉熵损失函数。本文提出的路径聚合扩张卷积网络结合新的损失函数方法在PASCAL VOC2012Aug数据集上获得了更好的效果,将mIoU值从78.77%提升到了80.44%。

关键词: 图像语义分割, 双向交叉熵, 路径聚合结构, 多尺度预测, 深度学习

Abstract: The deep full convolutional neural network based on encoder-decoder structure has made significant progress in image semantic segmentation. However, the path of transferring low-level positioning information in the deep network to the high-level network is too long, which makes it difficult to use low-level positioning information in the decoder stage to restore the boundary structure of the object. Aiming at this problem, a path aggregation structure used in the decoder part of segmentation network is proposed. This structure shortens the propagation path of low-level information to high-level information in the segmentation network and provides multi-scale contextual semantic information, so that the segmentation network can produce more refined boundary segmentation results. Aiming at the pro- blem that the softmax cross-entropy loss function often used in semantic segmentation is insufficient to distinguish samples with similar appearance, this paper reforms the softmax cross-entropy loss function and proposes a bidirectional cross-entropy loss function. Combining the proposed path aggregation Atrous convolutional network with the new loss function method can obtain better results on the PASCAL VOC2012Aug data set, which increases the mIoU value from 78.77% to 80.44%.


Key words: semantic , image segmentation;bidirectional cross-entropy;path aggregation structure;multi-scale prediction;deep learning