• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

Computer Engineering & Science ›› 2026, Vol. 48 ›› Issue (4): 689-698.

• Graphics and Images • Previous Articles     Next Articles

Scene text detection based on feature enhancement and adaptively multi-scale feature fusion

LI Qiong,QI Changshi,XIE Kai   

  1.  (1.School of Electrical and Information Engineering,Wuhan Institute of Technology,Wuhan 430205;
    2.School of Electronic Information and Electrical Engineering,Yangtze University,Jingzhou 434023,China)
  • Received:2024-07-04 Revised:2024-11-15 Online:2026-04-25 Published:2026-04-30

Abstract: To address the issue of inaccurate text region localization caused by diverse text forms and complex back-grounds in natural scenes, this paper proposes a text detection algorithm based on feature enhancement and adaptively multi-scale feature fusion. Firstly, the residual network is improved to reduce the loss of semantic information. Secondly, coordinate attention is embedded into the extracted features to suppress redundant background information and improve attention to text regions, thereby enhancing the ability to locate text boundaries. Thirdly, an adaptive multi-scale feature fusion module is incorporated to integrate learned spatial location weights into feature maps at different scales, enabling more comprehensive fusion of multi-scale feature information. Finally, a differentiable binarization algorithm is used to generate text detection results. To verify the effectiveness of the algorithm, experiments were conducted on the publicly available datasets ICDAR2015, MSRA-TD500, and Total Text,  achieving comprehensive metric F1 -score of 88.1%, 87.7%, and 86.3%, respectively. The experimental results demonstrate that this algorithm exhibits good robustness and generalization in text detection.

Key words: scene text detection;coordinate attention;adaptively multi-scale feature fusion, differentiable binarization