论文标题
TextMatcher:跨注意神经网络比较图像和文本
TextMatcher: Cross-Attentional Neural Network to Compare Image and Text
论文作者
论文摘要
我们研究了一个新颖的多模式学习问题,我们称之为文本匹配:给定一个包含单线文本和候选文本转录的图像,目标是评估图像中表示的文本是否与候选文本相对应。我们设计了专门为此问题设计的第一个机器学习模型。所提出的模型称为TextMatcher,通过在图像和文本的嵌入表示上应用跨注意机制来比较这两个输入,并以端到端的方式进行了训练。我们在流行的IAM数据集中广泛评估了文本匹配者的经验性表现。结果证明,与针对相关问题设计的基线和现有模型相比,TextMather在各种配置上都能达到更高的性能,同时在推理时间更快地运行。我们还在现实世界应用程序方案中展示了有关银行支票自动处理的情况。
We study a novel multimodal-learning problem, which we call text matching: given an image containing a single-line text and a candidate text transcription, the goal is to assess whether the text represented in the image corresponds to the candidate text. We devise the first machine-learning model specifically designed for this problem. The proposed model, termed TextMatcher, compares the two inputs by applying a cross-attention mechanism over the embedding representations of image and text, and it is trained in an end-to-end fashion. We extensively evaluate the empirical performance of TextMatcher on the popular IAM dataset. Results attest that, compared to a baseline and existing models designed for related problems, TextMatcher achieves higher performance on a variety of configurations, while at the same time running faster at inference time. We also showcase TextMatcher in a real-world application scenario concerning the automatic processing of bank cheques.