无监督的视频域适应行动识别：一个解开视角

论文标题

无监督的视频域适应行动识别：一个解开视角

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

论文作者

Wei, Pengfei, Kong, Lingdong, Qu, Xinghua, Ren, Yi, Xu, Zhiqiang, Jiang, Jing, Yin, Xiang

论文摘要

无监督的视频域适应是一项实用但具有挑战性的任务。在这项工作中，我们第一次从分离的观点中解决了它。我们的关键思想是通过分离分别处理空间和时间领域的差异。具体而言，我们考虑从两组潜在因素中生成跨域视频，一种编码静态信息，另一个编码动态信息。然后开发转移顺序的VAE（Transvae）框架以建模这种产生。为了更好地适应适应，我们提出了一些目标来限制潜在因素。借助这些约束，可以通过删除特定于静态域的信息来轻易消除空间差异，并且通过对抗性学习从框架和视频级别进一步降低了时间差异。与几种最先进的方法相比，对UCF-HMDB，Jester和Epic-Kitchens数据集进行了广泛的实验验证了Transvae的有效性和优势。代码公开可用。

Unsupervised video domain adaptation is a practical yet challenging task. In this work, for the first time, we tackle it from a disentanglement view. Our key idea is to handle the spatial and temporal domain divergence separately through disentanglement. Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information. A Transfer Sequential VAE (TranSVAE) framework is then developed to model such generation. To better serve for adaptation, we propose several objectives to constrain the latent factors. With these constraints, the spatial divergence can be readily removed by disentangling the static domain-specific information out, and the temporal divergence is further reduced from both frame- and video-levels through adversarial learning. Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of TranSVAE compared with several state-of-the-art approaches. Code is publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题