GTC：对CTC进行指导培训朝着高效且准确的场景识别

论文标题

GTC：对CTC进行指导培训朝着高效且准确的场景识别

GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition

论文作者

Hu, Wenyang, Cai, Xiaocong, Hou, Jun, Yi, Shuai, Lin, Zhiping

论文摘要

Connectionist时间分类（CTC）和注意机制是最近场景识别工作中使用的两种主要方法。与基于注意的方法相比，CTC解码器的推理时间短得多，但精度较低。为了设计一个高效有效的模型，我们提出了CTC（GTC）的指导培训，其中CTC模型从更强大的注意力指导中学习了更好的对齐和功能表示。凭借带导训练的培训，CTC模型可以在维持快速推理速度的同时，可以实现常规和不规则场景文本的稳健预测。此外，为了进一步利用CTC解码器的潜力，提出了图形卷积网络（GCN）来学习提取特征的局部相关性。对标准基准测试的广泛实验表明，我们的端到端模型为常规和不规则场景文本识别提供了一种新的最先进，并且需要比基于注意的方法短的推理时间短6倍。

Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet a lower accuracy. To design an efficient and effective model, we propose the guided training of CTC (GTC), where CTC model learns a better alignment and feature representations from a more powerful attentional guidance. With the benefit of guided training, CTC model achieves robust and accurate prediction for both regular and irregular scene text while maintaining a fast inference speed. Moreover, to further leverage the potential of CTC decoder, a graph convolutional network (GCN) is proposed to learn the local correlations of extracted features. Extensive experiments on standard benchmarks demonstrate that our end-to-end model achieves a new state-of-the-art for regular and irregular scene text recognition and needs 6 times shorter inference time than attentionbased methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题