通过标签到原型学习进行开放式文本识别

论文标题

通过标签到原型学习进行开放式文本识别

Towards Open-Set Text Recognition via Label-to-Prototype Learning

论文作者

Liu, Chang, Yang, Chun, Qin, Hai-Bo, Zhu, Xiaobin, Liu, Cheng-Lin, Yin, Xu-Cheng

论文摘要

场景文本识别是一个受欢迎的主题，并在行业中广泛使用。尽管许多方法在封闭式文本识别挑战方面取得了令人满意的性能，但这些方法在开放式场景中丧失了可行性，在开放式方案中，收集数据或新颖性格的重新训练模型可能会产生高成本。例如，对外语的注释样本可能很昂贵，而每次从历史文档中发现新颖角色时，都会重新训练该模型。在本文中，我们介绍并制定了一项新的开放式文本识别任务，该任务要求能够发现和识别新颖的角色而无需再培训。标签到原型学习框架也被提议作为建议任务的基准。具体而言，该框架引入了可概括的标签到原型映射功能，以构建可见和看不见类的原型（类中心）。然后，将开放式预测变量用于根据原型识别或拒绝样品。在集合字符上的拒绝能力实现允许在传入数据流中自动发现未知字符。广泛的实验表明，我们的方法在各种零射击，封闭设置和开放式文本识别数据集上实现了有希望的表现

Scene text recognition is a popular topic and extensively used in the industry. Although many methods have achieved satisfactory performance for the close-set text recognition challenges, these methods lose feasibility in open-set scenarios, where collecting data or retraining models for novel characters could yield a high cost. For example, annotating samples for foreign languages can be expensive, whereas retraining the model each time when a novel character is discovered from historical documents costs both time and resources. In this paper, we introduce and formulate a new open-set text recognition task which demands the capability to spot and recognize novel characters without retraining. A label-to-prototype learning framework is also proposed as a baseline for the proposed task. Specifically, the framework introduces a generalizable label-to-prototype mapping function to build prototypes (class centers) for both seen and unseen classes. An open-set predictor is then utilized to recognize or reject samples according to the prototypes. The implementation of rejection capability over out-of-set characters allows automatic spotting of unknown characters in the incoming data stream. Extensive experiments show that our method achieves promising performance on a variety of zero-shot, close-set, and open-set text recognition datasets

下载PDF全文

下载文献需遵守相关版权规定

论文标题