dptnet：场景文本检测的双路变压器体系结构

论文标题

dptnet：场景文本检测的双路变压器体系结构

DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

论文作者

Lin, Jingyu, Jiang, Jie, Yan, Yan, Guo, Chunchao, Wang, Hongfa, Liu, Wei, Wang, Hanzi

论文摘要

深度学习的繁荣有助于场景文本检测的快速进步。在所有卷积网络的方法中，基于细分的方法在检测任意形状和极端纵横比的文本实例方面的优越性，引起了广泛的关注。但是，自下而上的方法仅限于其分割模型的性能。在本文中，我们提出了DPTNET（双路线变压器网络），这是一种简单而有效的体系结构，旨在为场景文本检测任务建模全局和本地信息。我们进一步提出了一种平行的设计，将卷积网络与强大的自我发场机制相结合，以在注意力路径和卷积路径之间提供互补的线索。此外，开发了两个路径上的双向相互作用模块，以提供通道和空间维度的互补线索。我们还通过向其添加额外的多头注意力层来升级集中操作。我们的DPTNET在MSRA-TD500数据集上实现了最先进的结果，并就检测准确性和速度都提供了其他标准基准的竞争结果。

The prosperity of deep learning contributes to the rapid progress in scene text detection. Among all the methods with convolutional networks, segmentation-based ones have drawn extensive attention due to their superiority in detecting text instances of arbitrary shapes and extreme aspect ratios. However, the bottom-up methods are limited to the performance of their segmentation models. In this paper, we propose DPTNet (Dual-Path Transformer Network), a simple yet effective architecture to model the global and local information for the scene text detection task. We further propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path. Moreover, a bi-directional interaction module across the two paths is developed to provide complementary clues in the channel and spatial dimensions. We also upgrade the concentration operation by adding an extra multi-head attention layer to it. Our DPTNet achieves state-of-the-art results on the MSRA-TD500 dataset, and provides competitive results on other standard benchmarks in terms of both detection accuracy and speed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题