To address the issues of mis-segmentation, low segmentation accuracy, and severe loss of detailed information commonly encountered in the existing DeepLabV3+ semantic segmentation model, a feature-fusion semantic segmentation model based on an attention mechanism is proposed. Firstly, a switchable atrous convolution is cascaded within the dilated convolution branch of the model, enabling it to adapt more flexibly to features at different scales and thereby reducing mis-segmentation. Additionally, an RFEM module is introduced to capture multi-scale information from shallow features and depen- dencies across different ranges, enhancing the model’s performance. Furthermore, intermediate-layer features of the model are extracted and fused with its deep features using the ELAFF module, enabling the model to recover detailed information lost during the downsampling process. Finally, an efficient local attention mechanism is added to make the model focus more on image information and reduce background interference. Experimental results on the PASCAL VOC 2012 dataset demonstrate that, compared to the original model, the proposed model achieves a 2.36 percentage points increase in mean intersection-over-union (mIoU) and a 1.60 percentage points improvement in mean pixel accuracy (MPA), effectively enhancing the model’s segmentation performance.