论文标题
SIMA:视觉变压器的简单无软磁心注意力
SimA: Simple Softmax-free Attention for Vision Transformers
论文作者
论文摘要
最近,视觉变形金刚变得非常流行。但是,将它们部署在许多应用程序中的计算昂贵部分是由于注意力块中的软磁层。我们引入了一个简单但有效,无软的注意力块Sima,它使用简单的$ \ ell_1 $ -norm而不是使用SoftMax层来归一化查询和键矩阵。然后,SIMA中的注意力块是三个矩阵的简单乘法,因此SIMA可以在测试时间动态更改计算的顺序,以在令牌数量或通道数量上实现线性计算。我们从经验上表明,与SOTA模型相比,SIMA应用于变压器,DEIT,XCIT和CVT的三种SOTA变体,无需使用SoftMax层。有趣的是,将SIMA从多头更改为单头只会对精度有很小的影响,这进一步简化了注意力障碍。该代码可在此处找到:https://github.com/ucdvision/sima
Recently, vision transformers have become very popular. However, deploying them in many applications is computationally expensive partly due to the Softmax layer in the attention block. We introduce a simple but effective, Softmax-free attention block, SimA, which normalizes query and key matrices with simple $\ell_1$-norm instead of using Softmax layer. Then, the attention block in SimA is a simple multiplication of three matrices, so SimA can dynamically change the ordering of the computation at the test time to achieve linear computation on the number of tokens or the number of channels. We empirically show that SimA applied to three SOTA variations of transformers, DeiT, XCiT, and CvT, results in on-par accuracy compared to the SOTA models, without any need for Softmax layer. Interestingly, changing SimA from multi-head to single-head has only a small effect on the accuracy, which simplifies the attention block further. The code is available here: https://github.com/UCDvision/sima