通过语义分组的自我监督的视觉表示学习

论文标题

通过语义分组的自我监督的视觉表示学习

Self-Supervised Visual Representation Learning with Semantic Grouping

论文作者

Wen, Xin, Zhao, Bingchen, Zheng, Anlin, Zhang, Xiangyu, Qi, Xiaojuan

论文摘要

在本文中，我们解决了以未标记的场景数据学习视觉表示的问题。现有的作品证明了利用以场景数据为中心的基本复杂结构的潜力；尽管如此，他们通常依靠手工制作的客观先验或专门的借口任务来构建学习框架，这可能会损害普遍性。取而代之的是，我们提出了从数据驱动的语义插槽（即slotcon）中进行对比度学习，用于联合语义分组和表示。语义分组是通过将像素分配给一组可学习的原型来执行的，该原型可以通过刻板池在功能上和形成新的插槽中适应每个样本。基于学到的数据依赖性插槽，对对比目标进行了表示，从而增强了特征的可区分性，并相反促进将语义相干像素分组在一起。与以前的努力相比，通过同时优化语义分组和对比度学习的两个耦合目标，我们的方法绕开了手工制作的先验的缺点，并能够从以场景为中心的图像中学习对象/组级别的表示。实验表明，我们的方法有效地将复杂的场景分解为语义组，以进行特征学习，并显着受到下游任务，包括对象检测，实例分割和语义分割。代码可在以下网址提供：https：//github.com/cvmi-lab/slotcon。

In this paper, we tackle the problem of learning visual representations from unlabeled scene-centric data. Existing works have demonstrated the potential of utilizing the underlying complex structure within scene-centric data; still, they commonly rely on hand-crafted objectness priors or specialized pretext tasks to build a learning framework, which may harm generalizability. Instead, we propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning. The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots. Based on the learned data-dependent slots, a contrastive objective is employed for representation learning, which enhances the discriminability of features, and conversely facilitates grouping semantically coherent pixels together. Compared with previous efforts, by simultaneously optimizing the two coupled objectives of semantic grouping and contrastive learning, our approach bypasses the disadvantages of hand-crafted priors and is able to learn object/group-level representations from scene-centric images. Experiments show our approach effectively decomposes complex scenes into semantic groups for feature learning and significantly benefits downstream tasks, including object detection, instance segmentation, and semantic segmentation. Code is available at: https://github.com/CVMI-Lab/SlotCon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题