使用基于变形金刚的新型模型和创新的2.7亿字的端到端OCR框架，用于强大的阿拉伯手写识别，并具有27000万字的多名古典阿拉伯语语料库

论文标题

使用基于变形金刚的新型模型和创新的2.7亿字的端到端OCR框架，用于强大的阿拉伯手写识别，并具有27000万字的多名古典阿拉伯语语料库

An End-to-End OCR Framework for Robust Arabic-Handwriting Recognition using a Novel Transformers-based Model and an Innovative 270 Million-Words Multi-Font Corpus of Classical Arabic with Diacritics

论文作者

Mostafa, Aly, Mohamed, Omar, Ashraf, Ali, Elbehery, Ahmed, Jamal, Salma, Salah, Anas, Ghoneim, Amr S.

论文摘要

这项研究是有关阿拉伯历史文档的光学特征识别（OCR）的一系列研究的第二阶段，并研究了不同的建模程序如何与问题相互作用。第一项研究研究了变压器对我们定制的阿拉伯数据集的影响。第一项研究的弊端之一是培训数据的规模，由于缺乏资源，我们的3000万张图像中仅15000张图像。另外，我们添加了一个图像增强层，时间和空间优化以及后校正层，以帮助该模型预测正确的上下文。值得注意的是，我们提出了一种使用视觉变压器作为编码器的端到端文本识别方法，即Beit和Vanilla Transformer作为解码器，消除了CNNS以进行特征提取和降低模型的复杂性。实验表明，我们的端到端模型的表现优于卷积骨架。该模型的CER为4.46％。

This research is the second phase in a series of investigations on developing an Optical Character Recognition (OCR) of Arabic historical documents and examining how different modeling procedures interact with the problem. The first research studied the effect of Transformers on our custom-built Arabic dataset. One of the downsides of the first research was the size of the training data, a mere 15000 images from our 30 million images, due to lack of resources. Also, we add an image enhancement layer, time and space optimization, and Post-Correction layer to aid the model in predicting the correct word for the correct context. Notably, we propose an end-to-end text recognition approach using Vision Transformers as an encoder, namely BEIT, and vanilla Transformer as a decoder, eliminating CNNs for feature extraction and reducing the model's complexity. The experiments show that our end-to-end model outperforms Convolutions Backbones. The model attained a CER of 4.46%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题