• 中国计算机学会会刊
  • 中国科技核心期刊
  • 中文核心期刊

计算机工程与科学

• 图形与图像 • 上一篇    下一篇

基于垂直区域回归网络的自然场景文本检测

杨国亮,王志元,张雨,康乐乐,胡政伟   

  1. (江西理工大学电气工程与自动化学院,江西 赣州 341000)
  • 收稿日期:2017-03-20 修回日期:2017-05-09 出版日期:2018-07-25 发布日期:2018-07-25
  • 基金资助:

    国家自然科学基金(51365017,61305019)

Scene text detection based on
perpendicular regional regression networks

YANG Guoliang,WANG Zhiyuan,ZHANG Yu,KANG Lele,HU Zhengwei   

  1. (School of Electrical Engineering and Automation,Jiangxi University of Science and Technology,Ganzhou 341000,China)
     
  • Received:2017-03-20 Revised:2017-05-09 Online:2018-07-25 Published:2018-07-25

摘要:

由于自然场景下文本检测不同于传统的物体检测,直接采用RPN算法对文本检测会有一定的限制,一方面,由于文本区域具有可变长度、背景复杂、多样化等因素,网络必须设计更大的感受野;另一方面,在RPN训练阶段,正样本的选择会出现大量的误检和漏检情况。对此提出一种基于垂直区域回归网络的算法,首先采用Hough算法对部分场景图像进行倾斜校正预处理;其次在训练阶段基于groundtruth框与候选框Anchor在垂直方向上IOU值(交集与并集之比)大于某个阈值的情况下选择正样本,且在垂直方向上对正样本进行分类回归;最后由多个相邻Anchor合并形成文本区域。实验结果表明,在ICDAR2011和ICDAR2013数据集上获得了良好的检测效果。

关键词: 文本检测, 感受野, 多样化, 垂直区域回归网络

Abstract:

As the text detection in natural scenes is different from traditional object detection, using the region proposcal network (RPN) method proposed by FasterRcnn for text detection directly has some restrictions. On the one hand, because of variable length, background complexity, diversification of the text area and other factors, a greater receptive field design is required. On the other hand,
in the RPN training phase, there are a large number of false positives and missed detections in the selection of positive samples.
 
We propose a method based on perpendicular regional regression networks. Firstly, the Hough method is used to adjust the slope of the partial scene image. Secondly, in the training phase, based on the groundtruth box and the candidate box Anchor, the samples with an IOU value (intersection and union ratio) in vertical direction greater than a threshold, are selected as the positive sample. Thirdly, the positive samples in vertical direction are classified as regression. Finally, multiple adjacent Anchors are combined to form a text area. Experiments on the ICDAR2011 and ICDAR2013 data sets have a good detection result.

 

 

 

Key words: text detection, receptive field, diversification, perpendicular regional regression network