邻里注意变形金刚

论文标题

邻里注意变形金刚

Neighborhood Attention Transformer

论文作者

Hassani, Ali, Walton, Steven, Li, Jiachen, Li, Shen, Shi, Humphrey

论文摘要

我们提出邻里注意力（NA），这是视力的第一个高效且可扩展的滑动窗口注意机制。 Na是一个像素的操作，将自我注意力（SA）定位在最近的相邻像素上，因此与SA的二次复杂性相比，享有线性时间和空间的复杂性。滑动窗口模式可以使Na的接受场不需要额外的像素移动而增长，并且可以保留转移性的等同性，这与Swin Transformer的窗户自我注意力（WSA）不同。我们开发了Natten（邻居注意扩展），这是一种具有高效C ++和CUDA内核的Python软件包，它使NA可以比Swin的WSA快40％，同时使用多达25％的内存。我们进一步介绍了基于NA的新的层次变压器设计，它进一步介绍了邻里注意力变压器（NAT），可提高图像分类和下游视觉性能。 NAT的实验结果具有竞争力； Nat-tiny在Imagenet上达到83.2％的TOP-1精度，MS-Coco上的MAP为51.4％，ADE20K的MIOU上达到了48.4％MIOU，即ImagEnet精度为1.9％，可可映射1.0％和2.6％的ADE20K MIOU改进具有相似尺寸的SWIN模型。为了支持基于滑动窗口注意的更多研究，我们为我们的项目开源，并在以下网址发布我们的检查站：https：//github.com/shi-labs/neighborhood-ctithention-transformer。

We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision. NA is a pixel-wise operation, localizing self attention (SA) to the nearest neighboring pixels, and therefore enjoys a linear time and space complexity compared to the quadratic complexity of SA. The sliding-window pattern allows NA's receptive field to grow without needing extra pixel shifts, and preserves translational equivariance, unlike Swin Transformer's Window Self Attention (WSA). We develop NATTEN (Neighborhood Attention Extension), a Python package with efficient C++ and CUDA kernels, which allows NA to run up to 40% faster than Swin's WSA while using up to 25% less memory. We further present Neighborhood Attention Transformer (NAT), a new hierarchical transformer design based on NA that boosts image classification and downstream vision performance. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20K, which is 1.9% ImageNet accuracy, 1.0% COCO mAP, and 2.6% ADE20K mIoU improvement over a Swin model with similar size. To support more research based on sliding-window attention, we open source our project and release our checkpoints at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .

下载PDF全文

下载文献需遵守相关版权规定

论文标题