论文标题
将一般多模式预处理模型转移到文本识别
Transferring General Multimodal Pretrained Models to Text Recognition
论文作者
论文摘要
本文提出了一种新方法,即OFA-OR,以将多模式预处理模型转移到文本识别中。具体而言,我们将文本识别重新验证为图像字幕,并将统一视觉识别的模型直接传输到最终任务。在不预读大规模的注释或合成文本识别数据的情况下,Of-ocr的表现优于基准,并在中国文本识别基准中实现最先进的表现。此外,我们使用OFA-OR构建OCR管道,并证明它可以通过产品级API实现竞争性能。代码(https://github.com/ofa-sys/ofa)和演示(https://modelscope.cn/studios/damo/ofa_ocr_pipeline/summary)可公开使用。
This paper proposes a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition. Specifically, we recast text recognition as image captioning and directly transfer a unified vision-language pretrained model to the end task. Without pretraining on large-scale annotated or synthetic text recognition data, OFA-OCR outperforms the baselines and achieves state-of-the-art performance in the Chinese text recognition benchmark. Additionally, we construct an OCR pipeline with OFA-OCR, and we demonstrate that it can achieve competitive performance with the product-level API. The code (https://github.com/OFA-Sys/OFA) and demo (https://modelscope.cn/studios/damo/ofa_ocr_pipeline/summary) are publicly available.