论文标题
Transrank:通过基于排名的转换识别的自我监督视频表示学习
TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognition
论文作者
论文摘要
识别适用于视频剪辑(复发性)的转换类型是自我监督视频表示学习的悠久范式,与最近作品中的实例歧视方法(Instdisc)相比,它的性能较低。但是,基于对代表性的复发和Instdisc方法的彻底比较,我们观察到在语义相关和时间相关的下游任务上,反应的巨大潜力。基于硬标签的分类,现有的混合型方法遭受了预训练中嘈杂的监督信号。为了减轻这个问题,我们开发了Transrank,这是识别排名公式中转换的统一框架。 Transrank通过相对识别转换,始终超过基于分类的公式来提供准确的监督信号。同时,统一框架可以通过任意的时间或空间转换来实例化,表明一般性良好。通过基于排名的配方和几种经验实践,我们在视频检索和行动识别上实现了竞争性能。在相同的环境下,transrank在UCF101上超过了先前的最新方法,而HMDB51则超过了8.3%的动作识别(TOP1 ACC);将UCF101上的视频检索提高20.4%(R@1)。有希望的结果证明,复发仍然值得探索视频自学学习的范式。代码将在https://github.com/kennymckormick/transrank上发布。
Recognizing transformation types applied to a video clip (RecogTrans) is a long-established paradigm for self-supervised video representation learning, which achieves much inferior performance compared to instance discrimination approaches (InstDisc) in recent works. However, based on a thorough comparison of representative RecogTrans and InstDisc methods, we observe the great potential of RecogTrans on both semantic-related and temporal-related downstream tasks. Based on hard-label classification, existing RecogTrans approaches suffer from noisy supervision signals in pre-training. To mitigate this problem, we developed TransRank, a unified framework for recognizing Transformations in a Ranking formulation. TransRank provides accurate supervision signals by recognizing transformations relatively, consistently outperforming the classification-based formulation. Meanwhile, the unified framework can be instantiated with an arbitrary set of temporal or spatial transformations, demonstrating good generality. With a ranking-based formulation and several empirical practices, we achieve competitive performance on video retrieval and action recognition. Under the same setting, TransRank surpasses the previous state-of-the-art method by 6.4% on UCF101 and 8.3% on HMDB51 for action recognition (Top1 Acc); improves video retrieval on UCF101 by 20.4% (R@1). The promising results validate that RecogTrans is still a worth exploring paradigm for video self-supervised learning. Codes will be released at https://github.com/kennymckormick/TransRank.