掩盖的对比表示学习

论文标题

掩盖的对比表示学习

Masked Contrastive Representation Learning

论文作者

Yao, Yuchong, Desai, Nandakishor, Palaniswami, Marimuthu

论文摘要

掩盖的图像建模（例如，蒙版的自动编码器）和对比度学习（例如，动量对比度）在无监督的视觉表示学习上表现出令人印象深刻的表现。这项工作介绍了掩盖的对比表示学习（MACRL），以进行自我监视的视觉预训练。特别是，MACRL利用了掩盖图像建模和对比度学习的有效性。我们对暹罗网络采用不对称设置（即两个分支中的编码器结构），其中一个分支具有较高的掩码比率和更强的数据增强，而另一个分支则采用较弱的数据损坏。我们根据两个分支机构中的编码器从编码器中学的特征优化了一个对比度学习目标。此外，根据解码器的输出，我们最大程度地减少了$ L_1 $重建损失。在我们的实验中，MACRL在包括CIFAR-10，CIFAR-100，TININE-IMAGENET和其他两个Imagenet子集的各种视觉基准上提出了卓越的结果。我们的框架为自我监督的视觉预训练和未来研究提供了统一的见解。

Masked image modelling (e.g., Masked AutoEncoder) and contrastive learning (e.g., Momentum Contrast) have shown impressive performance on unsupervised visual representation learning. This work presents Masked Contrastive Representation Learning (MACRL) for self-supervised visual pre-training. In particular, MACRL leverages the effectiveness of both masked image modelling and contrastive learning. We adopt an asymmetric setting for the siamese network (i.e., encoder-decoder structure in both branches), where one branch with higher mask ratio and stronger data augmentation, while the other adopts weaker data corruptions. We optimize a contrastive learning objective based on the learned features from the encoder in both branches. Furthermore, we minimize the $L_1$ reconstruction loss according to the decoders' outputs. In our experiments, MACRL presents superior results on various vision benchmarks, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and two other ImageNet subsets. Our framework provides unified insights on self-supervised visual pre-training and future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题