视力变压器的多势关注

论文标题

视力变压器的多势关注

Multi-manifold Attention for Vision Transformers

论文作者

Konstantinidis, Dimitrios, Papastratis, Ilias, Dimitropoulos, Kosmas, Daras, Petros

论文摘要

如今，视觉变压器在几个计算机视觉任务（例如图像分类和动作识别）中的最新性能，如今非常受欢迎。尽管通过高度描述性的补丁嵌入和层次结构大大提高了它们的性能，但在利用其他数据表示方面仍然存在有限的研究，以完善变压器的自我发作图。为了解决这个问题，在这项工作中提出了一种新型的注意机制，称为多头脑多头注意，以替代变压器的香草自我注意力。提出的机制模拟了三个不同的歧管的输入空间，即欧几里得，对称正定和格拉斯曼，因此利用输入的不同统计和几何特性来计算强烈描述性注意力图。通过这种方式，提出的注意机制可以指导视觉变压器对图像的重要外观，颜色和纹理特征变得更加专注，从而改善了分类和分割结果，如众所周知的数据集对实验结果所示。

Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through highly descriptive patch embeddings and hierarchical structures, there is still limited research on utilizing additional data representations so as to refine the selfattention map of a Transformer. To address this problem, a novel attention mechanism, called multi-manifold multihead attention, is proposed in this work to substitute the vanilla self-attention of a Transformer. The proposed mechanism models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, thus leveraging different statistical and geometrical properties of the input for the computation of a highly descriptive attention map. In this way, the proposed attention mechanism can guide a Vision Transformer to become more attentive towards important appearance, color and texture features of an image, leading to improved classification and segmentation results, as shown by the experimental results on well-known datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题