通过视频曲目一致性在线深度聚类

论文标题

通过视频曲目一致性在线深度聚类

Online Deep Clustering with Video Track Consistency

论文作者

Alfani, Alessandra, Becattini, Federico, Seidenari, Lorenzo, Del Bimbo, Alberto

论文摘要

近年来，已经开发出了几种无监督和自我监督的方法，以从大规模未标记的数据集中学习视觉功能。然而，它们的主要缺点是，如果简单地旋转或相机的视角变化，这些方法几乎无法识别同一对象的视觉特征。为了克服此限制，同时利用有用的监督来源，我们考虑了视频对象轨道。遵循轨道中的两个补丁应该在学习的特征空间中具有相似的视觉表示形式之后，我们采用了一种基于群集的方法，并限制了这些表示形式，因为它们可能属于同一对象或对象部分，因此将其标记为同一类别。与先前的工作相比，不同数据集上两个下游任务的实验结果证明了我们在线深度聚类（ODCT）方法的有效性，而视频轨道一致性（ODCT）方法没有利用时间信息。此外，我们表明，与依靠昂贵和精确的轨道注释相比，利用无监督的类不知所措但嘈杂的轨道生成器的产量提高了准确性。

Several unsupervised and self-supervised approaches have been developed in recent years to learn visual features from large-scale unlabeled datasets. Their main drawback however is that these methods are hardly able to recognize visual features of the same object if it is simply rotated or the perspective of the camera changes. To overcome this limitation and at the same time exploit a useful source of supervision, we take into account video object tracks. Following the intuition that two patches in a track should have similar visual representations in a learned feature space, we adopt an unsupervised clustering-based approach and constrain such representations to be labeled as the same category since they likely belong to the same object or object part. Experimental results on two downstream tasks on different datasets demonstrate the effectiveness of our Online Deep Clustering with Video Track Consistency (ODCT) approach compared to prior work, which did not leverage temporal information. In addition we show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题