使用人类心理物理学评估场景文本识别模型中的概括

论文标题

使用人类心理物理学评估场景文本识别模型中的概括

Using Human Psychophysics to Evaluate Generalization in Scene Text Recognition Models

论文作者

Siddiqui, Sahar, Sizikova, Elena, Roig, Gemma, Majaj, Najib J., Pelli, Denis G.

论文摘要

近年来，场景文本识别模型已大大提高。受到人类阅读的启发，我们通过测量其域（即可以阅读的刺激图像范围）来表征两个重要的场景文本识别模型。该领域指定读者将其推广到不同单词长度，字体和遮挡量的能力。这些指标确定了现有模型的优势和劣势。相对于基于注意力的（ATTN）模型，我们发现连接派时间分类（CTC）模型对噪声和遮挡更强大，并且可以更好地推广到不同的单词长度。此外，我们表明，在这两个模型中，在训练图像中添加噪声都可以更好地概括阻塞。这些结果证明了测试模型的价值，直到它们破裂，并补充了传统的数据科学专注于优化性能。

Scene text recognition models have advanced greatly in recent years. Inspired by human reading we characterize two important scene text recognition models by measuring their domains i.e. the range of stimulus images that they can read. The domain specifies the ability of readers to generalize to different word lengths, fonts, and amounts of occlusion. These metrics identify strengths and weaknesses of existing models. Relative to the attention-based (Attn) model, we discover that the connectionist temporal classification (CTC) model is more robust to noise and occlusion, and better at generalizing to different word lengths. Further, we show that in both models, adding noise to training images yields better generalization to occlusion. These results demonstrate the value of testing models till they break, complementing the traditional data science focus on optimizing performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题