• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学 ›› 2023, Vol. 45 ›› Issue (11): 2008-2017.

• 图形与图像 • 上一篇    下一篇

改进DBNet的电商图像文字检测算法研究

李卓璇,周亚同   

  1. (河北工业大学电子信息工程学院,天津 300401)
  • 收稿日期:2022-09-02 修回日期:2022-12-14 接受日期:2023-11-25 出版日期:2023-11-25 发布日期:2023-11-16
  • 基金资助:
    京津冀基础研究合作专项(H2021202008,J210008);内蒙古自治区纪检监察大数据实验室开放课题(IMDBD202105)

iSFF-DBNet:An improved text detection algorithm in e-commerce images

LI Zhuo-xuan,ZHOU Ya-tong   

  1. (School of Electronic and Information Engineering,Hebei University of Technology,Tianjin 300401,China)
  • Received:2022-09-02 Revised:2022-12-14 Accepted:2023-11-25 Online:2023-11-25 Published:2023-11-16

摘要:

电商图像背景较为复杂、文字区域形状多变,现有的文字检测模型无法精确检测文字位置这一问题。提出一种改进的文字检测模型——迭代自选择特征融合DBNet(iSFF-DBNet)。首先在主干网络提取特征后,在构建特征金字塔网络FPN的过程中引入注意力机制;然后提出了迭代自选择特征融合模块iSFF来提升模型的特征提取能力;最后引入双边上采样模块提升可微分二值化模块的自适应性能。实验结果表明,在ICPR MTWI 2018网络图像数据集文本检测任务中,对比标准的DBNet模型,所提改进模型的召回率和F-score分别提升了6.0%和2.4%。与其他文字检测模型相比,该模型在精确率和召回率上取得了平衡,能够更准确地检测文字。

关键词: 文字检测, 多尺度特征, 特征融合, 深度学习

Abstract: Aiming at the problem that existing text detection models cannot accurately detect text locations due to complex backgrounds and variable text region shapes in e-commerce images, an improved text detection model, named Iterative Self-selective Feature Fusion DBNet (iSFF-DBNet), is proposed. Firstly, after extracting features from the backbone network, an attention mechanism is introduced in the process of building a Feature Pyramid Network (FPN), and an Iterative Self-selective Feature Fusion (iSFF) module is proposed to enhance the feature extraction ability of the model. Finally, a bilinear upsampling module is introduced to improve the adaptive performance of the differentiable binaryization module. Experimental results show that compared to the standard DBNet model, the recall and F-score of the improved model are increased by 6.0% and 2.4%, respectively, in the text detection task of the ICPR MTWI 2018 web-scale image dataset. Compared with other text detection models, this model achieves a balance between accuracy and recall, and can detect text more accurately.

Key words: