基于特征增强和自适应多尺度特征融合的场景文本检测

doi:10.3969/j.issn.1007-130X.2026.04.013

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (4): 689-698.doi: 10.3969/j.issn.1007-130X.2026.04.013

基于特征增强和自适应多尺度特征融合的场景文本检测

李琼,漆昌仕,谢凯

(1.武汉工程大学电气信息学院，湖北武汉 430205；2.长江大学电子信息与电气工程学院，湖北荆州 434023)

收稿日期:2024-07-04 修回日期:2024-11-15 出版日期:2026-04-25 发布日期:2026-04-30
基金资助:
国家自然科学基金（62272485）;江西省主要学科学术和技术带头人培养计划（20204BCJ22014）

Scene text detection based on feature enhancement and adaptively multi-scale feature fusion

LI Qiong,QI Changshi,XIE Kai

(1.School of Electrical and Information Engineering,Wuhan Institute of Technology,Wuhan 430205；
2.School of Electronic Information and Electrical Engineering,Yangtze University,Jingzhou 434023,China）

Received:2024-07-04 Revised:2024-11-15 Online:2026-04-25 Published:2026-04-30

摘要/Abstract

摘要： 针对自然场景中文本形态各异和背景复杂多变引起文本区域定位不准确的问题，提出了一种基于特征增强和自适应多尺度特征融合的文本检测算法。首先，改进残差网络，以减少语义信息的流失。其次，将坐标注意力嵌入到提取的特征中，以抑制冗余的背景信息和提高对文本区域的关注，从而增强对文本边界的定位能力。再次，结合自适应多尺度特征融合模块，将学习到的空间位置权重融入到不同尺度特征图中,以更充分地融合多尺度特征信息。最后，采用可微分二值化算法来生成文本检测结果。为了验证该算法的有效性，在公开数据集ICDAR2015、MSRA-TD500和Total-Text上进行实验，其综合指标F1值分别达到了88.1%,87.7%和86.3%。实验结果表明，该算法在文本检测上具有良好的鲁棒性和泛化性。

关键词: 场景文本检测, 坐标注意力, 自适应多尺度特征融合, 可微分二值化

Abstract: To address the issue of inaccurate text region localization caused by diverse text forms and complex back-grounds in natural scenes, this paper proposes a text detection algorithm based on feature enhancement and adaptively multi-scale feature fusion. Firstly, the residual network is improved to reduce the loss of semantic information. Secondly, coordinate attention is embedded into the extracted features to suppress redundant background information and improve attention to text regions, thereby enhancing the ability to locate text boundaries. Thirdly, an adaptive multi-scale feature fusion module is incorporated to integrate learned spatial location weights into feature maps at different scales, enabling more comprehensive fusion of multi-scale feature information. Finally, a differentiable binarization algorithm is used to generate text detection results. To verify the effectiveness of the algorithm, experiments were conducted on the publicly available datasets ICDAR2015, MSRA-TD500, and Total Text, achieving comprehensive metric F1 -score of 88.1%, 87.7%, and 86.3%, respectively. The experimental results demonstrate that this algorithm exhibits good robustness and generalization in text detection.

Key words: scene text detection;coordinate attention;adaptively multi-scale feature fusion, differentiable binarization

李琼, 漆昌仕, 谢凯. 基于特征增强和自适应多尺度特征融合的场景文本检测[J]. 计算机工程与科学, 2026, 48(4): 689-698.

LI Qiong, QI Changshi, XIE Kai. Scene text detection based on feature enhancement and adaptively multi-scale feature fusion[J]. Computer Engineering & Science, 2026, 48(4): 689-698.

[1]	王燕, 胡津源, 刘晶晶, 陈燕燕. 双先验引导的注意力特征聚合去雾生成对抗网络[J]. 计算机工程与科学, 2025, 47(10): 1841-1852.
[2]	彭晏飞, 孟欣, 李泳欣, 刘蓝兮. 结合坐标注意力与生成式对抗网络的图像超分辨率重建[J]. 计算机工程与科学, 2024, 46(1): 122-131.

基于特征增强和自适应多尺度特征融合的场景文本检测

Scene text detection based on feature enhancement and adaptively multi-scale feature fusion

PDF

可视化

摘要/Abstract

引用本文

使用本文

相关文章 2

编辑推荐

Metrics

本文评价