通过分解的行和列查询进行语义分割的双重纹理变压器

论文标题

通过分解的行和列查询进行语义分割的双重纹理变压器

Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation

论文作者

Wang, Ying, Ho, Chiuman, Xu, Wenju, Xuan, Ziwei, Liu, Xudong, Qi, Guo-Jun

论文摘要

至关重要的是，获得具有长距离依赖性的高分辨率特征，以进行密集的预测任务，例如语义分割。要从低分辨率$ h \ times w $（$ hw \ ll hw $）中生成大小$ h \ times w $的高分辨率输出，一种天真的密度变压器会产生$ \ nathcal {o}（o}（hwhw）$的顽固性复杂性，从而将其应用于高分辨率的高分辨率致密预测。我们建议通过将复杂性降低到$ \ Mathcal {o}（HW（H+W））$来实现高分辨率输出，以实现高分辨率输出，该计量比天真密集的变压器小。提出了分解的查询，以通过单独的变压器进行操纵和列的注意，并将其输出组合在一起以高分辨率以高分辨率形成密集的特征图。为此，从编码器中馈出的输入序列分别通过保存其行和列结构来与分解的查询保持平坦，以与分解的查询保持一致。行和列变压器还相互交互，以捕获其相互关注的关注，并与行之间的空间交叉点捕获。我们还建议通过有效的分组和汇总进行注意，以进一步降低模型的复杂性。对ADE20K和CityScapes数据集进行了广泛的实验，证明了拟议中的双压力变压器体系结构的优越性，并表现出更高的mious。

It is critical to obtain high resolution features with long range dependency for dense prediction tasks such as semantic segmentation. To generate high-resolution output of size $H\times W$ from a low-resolution feature map of size $h\times w$ ($hw\ll HW$), a naive dense transformer incurs an intractable complexity of $\mathcal{O}(hwHW)$, limiting its application on high-resolution dense prediction. We propose a Dual-Flattening Transformer (DFlatFormer) to enable high-resolution output by reducing complexity to $\mathcal{O}(hw(H+W))$ that is multiple orders of magnitude smaller than the naive dense transformer. Decomposed queries are presented to retrieve row and column attentions tractably through separate transformers, and their outputs are combined to form a dense feature map at high resolution. To this end, the input sequence fed from an encoder is row-wise and column-wise flattened to align with decomposed queries by preserving their row and column structures, respectively. Row and column transformers also interact with each other to capture their mutual attentions with the spatial crossings between rows and columns. We also propose to perform attentions through efficient grouping and pooling to further reduce the model complexity. Extensive experiments on ADE20K and Cityscapes datasets demonstrate the superiority of the proposed dual-flattening transformer architecture with higher mIoUs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题