论文标题

臀部:分层感知器

HiP: Hierarchical Perceiver

论文作者

Carreira, Joao, Koppula, Skanda, Zoran, Daniel, Recasens, Adria, Ionescu, Catalin, Henaff, Olivier, Shelhamer, Evan, Arandjelovic, Relja, Botvinick, Matt, Vinyals, Oriol, Simonyan, Karen, Zisserman, Andrew, Jaegle, Andrew

论文摘要

诸如感知者之类的一般感知系统可以在任何组合中处理任意模式,并且能够处理多达数十万个输入。他们通过完全使用全球关注操作来实现这种普遍性。但是,这阻碍了它们扩大到处理原始高分辨率图像或视频所需的输入尺寸。在本文中,我们表明可以将某种程度的地方引入这些模型中,从而大大提高了它们的效率,同时保留了它们的一般性。为了进一步扩展它们,我们引入了一种自我监督的方法,该方法使学习密集的低维位置嵌入具有非常大的信号。我们将结果模型称为分层感知器(HIP)。 In sum our contributions are: 1) scaling Perceiver-type models to raw high-resolution images and audio+video, 2) showing the feasibility of learning 1M+ positional embeddings from scratch using masked auto-encoding, 3) demonstrating competitive performance on raw data from ImageNet, AudioSet, PASCAL VOC, ModelNet40 and Kinetics datasets with the same exact, unchanged model and without specialized预处理或任何令牌化。

General perception systems such as Perceivers can process arbitrary modalities in any combination and are able to handle up to a few hundred thousand inputs. They achieve this generality by using exclusively global attention operations. This however hinders them from scaling up to the inputs sizes required to process raw high-resolution images or video. In this paper, we show that some degree of locality can be introduced back into these models, greatly improving their efficiency while preserving their generality. To scale them further, we introduce a self-supervised approach that enables learning dense low-dimensional positional embeddings for very large signals. We call the resulting model a Hierarchical Perceiver (HiP). In sum our contributions are: 1) scaling Perceiver-type models to raw high-resolution images and audio+video, 2) showing the feasibility of learning 1M+ positional embeddings from scratch using masked auto-encoding, 3) demonstrating competitive performance on raw data from ImageNet, AudioSet, PASCAL VOC, ModelNet40 and Kinetics datasets with the same exact, unchanged model and without specialized preprocessing or any tokenization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源