• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2026, Vol. 48 ›› Issue (4): 689-698.

• 图形与图像 • 上一篇    下一篇

基于特征增强和自适应多尺度特征融合的场景文本检测

李琼,漆昌仕,谢凯   

  1. (1.武汉工程大学电气信息学院,湖北 武汉 430205;2.长江大学电子信息与电气工程学院,湖北 荆州 434023)

  • 收稿日期:2024-07-04 修回日期:2024-11-15 出版日期:2026-04-25 发布日期:2026-04-30
  • 基金资助:
    国家自然科学基金(62272485);江西省主要学科学术和技术带头人培养计划(20204BCJ22014)

Scene text detection based on feature enhancement and adaptively multi-scale feature fusion

LI Qiong,QI Changshi,XIE Kai   

  1.  (1.School of Electrical and Information Engineering,Wuhan Institute of Technology,Wuhan 430205;
    2.School of Electronic Information and Electrical Engineering,Yangtze University,Jingzhou 434023,China)
  • Received:2024-07-04 Revised:2024-11-15 Online:2026-04-25 Published:2026-04-30

摘要: 针对自然场景中文本形态各异和背景复杂多变引起文本区域定位不准确的问题,提出了一种基于特征增强和自适应多尺度特征融合的文本检测算法。首先,改进残差网络,以减少语义信息的流失。其次,将坐标注意力嵌入到提取的特征中,以抑制冗余的背景信息和提高对文本区域的关注,从而增强对文本边界的定位能力。再次,结合自适应多尺度特征融合模块,将学习到的空间位置权重融入到不同尺度特征图中,以更充分地融合多尺度特征信息。最后,采用可微分二值化算法来生成文本检测结果。为了验证该算法的有效性,在公开数据集ICDAR2015、MSRA-TD500和Total-Text上进行实验,其综合指标F1值分别达到了88.1%,87.7%和86.3%。实验结果表明,该算法在文本检测上具有良好的鲁棒性和泛化性。


关键词: 场景文本检测, 坐标注意力, 自适应多尺度特征融合, 可微分二值化

Abstract: To address the issue of inaccurate text region localization caused by diverse text forms and complex back-grounds in natural scenes, this paper proposes a text detection algorithm based on feature enhancement and adaptively multi-scale feature fusion. Firstly, the residual network is improved to reduce the loss of semantic information. Secondly, coordinate attention is embedded into the extracted features to suppress redundant background information and improve attention to text regions, thereby enhancing the ability to locate text boundaries. Thirdly, an adaptive multi-scale feature fusion module is incorporated to integrate learned spatial location weights into feature maps at different scales, enabling more comprehensive fusion of multi-scale feature information. Finally, a differentiable binarization algorithm is used to generate text detection results. To verify the effectiveness of the algorithm, experiments were conducted on the publicly available datasets ICDAR2015, MSRA-TD500, and Total Text,  achieving comprehensive metric F1 -score of 88.1%, 87.7%, and 86.3%, respectively. The experimental results demonstrate that this algorithm exhibits good robustness and generalization in text detection.

Key words: scene text detection;coordinate attention;adaptively multi-scale feature fusion, differentiable binarization