TPFNET：一种新的文本介入式变压器，用于删除文本

论文标题

TPFNET：一种新的文本介入式变压器，用于删除文本

TPFNet: A Novel Text In-painting Transformer for Text Removal

论文作者

Susladkar, Onkar, Makwana, Dhruv, Deshmukh, Gayatri, Mittal, Sparsh, R, Sai Chandra Teja, Singhal, Rekha

论文摘要

图像中的文本擦除有助于各种任务，例如图像编辑和隐私保护。在本文中，我们提出了TPFNet，这是一个新型的单阶段（端到）网络，用于从图像中删除文本。我们的网络有两个部分：特征合成和图像生成。由于可以从低分辨率图像中更有效地除去噪声，因此第1部分在低分辨率图像上运行。第1部分的输出是低分辨率的无文本图像。第2部分使用第1部分中学到的功能来预测高分辨率的无文本图像。在第1部分中，我们将“金字塔视觉变压器”（PVT）用作编码器。此外，除了无文本图像外，我们还使用了一种新型的多头解码器，该解码器还会生成高通滤波的图像和分割图。分割分支有助于精确定位文本，高通分支有助于学习图像结构。要精确地定位文本，TPFNET采用了在分割映射而不是输入图像上有条件的对抗损失。在牛津，Scut和Scut-Enstext数据集上，我们的网络最近优于几乎所有指标的网络。例如，在Scut-Enstext数据集上，TPFNET具有39.0的PSNR（较高）为39.0，文本检测精度（较低的较高）为21.1，与最佳先前技术相比，PSNR为32.3，精度为53.2。可以从https://github.com/candlelabai/tpfnet获得源代码

Text erasure from an image is helpful for various tasks such as image editing and privacy preservation. In this paper, we present TPFNet, a novel one-stage (end-toend) network for text removal from images. Our network has two parts: feature synthesis and image generation. Since noise can be more effectively removed from low-resolution images, part 1 operates on low-resolution images. The output of part 1 is a low-resolution text-free image. Part 2 uses the features learned in part 1 to predict a high-resolution text-free image. In part 1, we use "pyramidal vision transformer" (PVT) as the encoder. Further, we use a novel multi-headed decoder that generates a high-pass filtered image and a segmentation map, in addition to a text-free image. The segmentation branch helps locate the text precisely, and the high-pass branch helps in learning the image structure. To precisely locate the text, TPFNet employs an adversarial loss that is conditional on the segmentation map rather than the input image. On Oxford, SCUT, and SCUT-EnsText datasets, our network outperforms recently proposed networks on nearly all the metrics. For example, on SCUT-EnsText dataset, TPFNet has a PSNR (higher is better) of 39.0 and text-detection precision (lower is better) of 21.1, compared to the best previous technique, which has a PSNR of 32.3 and precision of 53.2. The source code can be obtained from https://github.com/CandleLabAI/TPFNet

下载PDF全文

下载文献需遵守相关版权规定

论文标题