鹿：场景文本斑点的检测 - 不合时宜的端到端识别器

论文标题

鹿：场景文本斑点的检测 - 不合时宜的端到端识别器

DEER: Detection-agnostic End-to-End Recognizer for Scene Text Spotting

论文作者

Kim, Seonghyeon, Shin, Seung, Kim, Yoonsik, Cho, Han-Cheol, Kil, Taeho, Surh, Jaeheung, Park, Seunghyun, Lee, Bado, Baek, Youngmin

论文摘要

最近的端到端场景示例人在识别任意形状的文本实例方面取得了巨大改进。文本斑点的常见方法使用感兴趣的区域汇总或分割面具将功能限制为单个文本实例。但是，这使得识别器很难在检测不准确时（即将一个或多个字符裁剪出来）解码正确的序列。考虑到很难仅使用检测器准确地确定单词边界，我们提出了一种新颖的检测 - 敏锐的端到端识别器，鹿，框架。提出的方法通过为每个文本实例的单个参考点桥接而不是使用检测区域，从而降低了检测和识别模块之间的紧密依赖性。提出的方法允许解码器识别参考点指示的文本，并具有整个图像中的特征。由于仅需要一个点才能识别文本，因此所提出的方法可以在没有任意形状的检测器或边界多边形注释的情况下进行文本斑点。实验结果表明，所提出的方法在常规和任意形状的文本斑点基准基准上实现了竞争结果。进一步的分析表明，鹿对检测误差是可靠的。代码和数据集将公开可用。

Recent end-to-end scene text spotters have achieved great improvement in recognizing arbitrary-shaped text instances. Common approaches for text spotting use region of interest pooling or segmentation masks to restrict features to single text instances. However, this makes it hard for the recognizer to decode correct sequences when the detection is not accurate i.e. one or more characters are cropped out. Considering that it is hard to accurately decide word boundaries with only the detector, we propose a novel Detection-agnostic End-to-End Recognizer, DEER, framework. The proposed method reduces the tight dependency between detection and recognition modules by bridging them with a single reference point for each text instance, instead of using detected regions. The proposed method allows the decoder to recognize the texts that are indicated by the reference point, with features from the whole image. Since only a single point is required to recognize the text, the proposed method enables text spotting without an arbitrarily-shaped detector or bounding polygon annotations. Experimental results present that the proposed method achieves competitive results on regular and arbitrarily-shaped text spotting benchmarks. Further analysis shows that DEER is robust to the detection errors. The code and dataset will be publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题