论文标题

拼图:学习视觉变压器中的学习拼图游戏

Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer

论文作者

Chen, Yingyi, Shen, Xi, Liu, Yahui, Tao, Qinghua, Suykens, Johan A. K.

论文摘要

视觉变压器(VIT)在各种计算机视觉任务中的成功促进了此无卷积网络的不断增长。 VIT在图像贴片上工作的事实可能与拼图拼图解决的问题有关,这是一项经典的自我监督任务,旨在重新排序洗牌的顺序图像贴片回到其自然形式。尽管它很简单,但解决了使用卷积神经网络(CNN)(例如自我监督的特征表示学习,领域的概括和细粒度分类)的解决拼图拼图有助于不同的任务。 在本文中,我们探索了解决拼图拼图,作为对图像分类的自我监督的辅助损失,名为Jigsaw-Vit。我们展示了两种修改,可以使拼图优于标准VIT:丢弃位置嵌入和随机掩盖斑块。但是很简单,我们发现拼图杆能够改善标准VIT的概括和鲁棒性,这通常是一种权衡。在实验上,我们表明,在ImageNet上的大规模图像分类中,添加拼图拼图分支比VIT提供了更好的概括。此外,辅助任务还提高了对动物-10n,食物101N和服装的嘈杂标签的鲁棒性,也可以提高对抗性的例子。我们的实施可从https://yingyichen-cyy.github.io/jigsaw-vit/获得。

The success of Vision Transformer (ViT) in various computer vision tasks has promoted the ever-increasing prevalence of this convolution-free network. The fact that ViT works on image patches makes it potentially relevant to the problem of jigsaw puzzle solving, which is a classical self-supervised task aiming at reordering shuffled sequential image patches back to their natural form. Despite its simplicity, solving jigsaw puzzle has been demonstrated to be helpful for diverse tasks using Convolutional Neural Networks (CNNs), such as self-supervised feature representation learning, domain generalization, and fine-grained classification. In this paper, we explore solving jigsaw puzzle as a self-supervised auxiliary loss in ViT for image classification, named Jigsaw-ViT. We show two modifications that can make Jigsaw-ViT superior to standard ViT: discarding positional embeddings and masking patches randomly. Yet simple, we find that Jigsaw-ViT is able to improve both in generalization and robustness over the standard ViT, which is usually rather a trade-off. Experimentally, we show that adding the jigsaw puzzle branch provides better generalization than ViT on large-scale image classification on ImageNet. Moreover, the auxiliary task also improves robustness to noisy labels on Animal-10N, Food-101N, and Clothing1M as well as adversarial examples. Our implementation is available at https://yingyichen-cyy.github.io/Jigsaw-ViT/.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源