自我监督的对应学习的局部感知的界面和视频内的重建

论文标题

自我监督的对应学习的局部感知的界面和视频内的重建

Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning

论文作者

Li, Liulei, Zhou, Tianfei, Wang, Wenguan, Yang, Lu, Li, Jianwu, Yang, Yi

论文摘要

我们的目标是从未标记的视频中学习视觉通信。我们开发了Liir，这是一种局部感知的Inter-Video内部重建框架，填充了三个缺失的作品，即，实例歧视，位置意识和空间紧凑性，是自我监督的对应学习难题。首先，我们将跨视频亲和力作为统一的，互Video内部的重建方案中的额外负面样本而不是仅仅关注Video内部自我划分的大多数现有努力。通过将所需的视频内像素关联与负面视频间的对应关系形成鲜明对比，这使实例判别表示学习。其次，我们将位置信息合并为对应关系匹配，并设计一个位置转换策略，以删除视频间亲和力计算过程中位置编码的副作用，从而使我们的LIIR位置敏感。第三，为了充分利用视频数据的空间连续性性质，我们对匹配的对应关系施加了基于紧凑的约束，从而产生了更稀疏和可靠的解决方案。学识渊博的表示形式超过了在包括对象，语义部分和关键点在内的标签传播任务上的自我监督的最先进。

Our target is to learn visual correspondence from unlabeled videos. We develop LIIR, a locality-aware inter-and intra-video reconstruction framework that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our LIIR location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题