变形金刚的多数据训练以进行强大的动作识别

论文标题

变形金刚的多数据训练以进行强大的动作识别

Multi-dataset Training of Transformers for Robust Action Recognition

论文作者

Liang, Junwei, Zhang, Enwei, Zhang, Jun, Shen, Chunhua

论文摘要

我们研究了鲁棒特征表示的任务，旨在在多个数据集上良好地概括以进行行动识别。我们建立了有关变形金刚的功效的方法。尽管在过去十年中，我们目睹了视频动作识别的巨大进展，但如何培训可以在多个数据集中表现良好的单一模型仍然具有挑战性但有价值的事情。在这里，我们提出了一种新颖的多数据集训练范式，Multitrain，设计了两个新的损失条款，即信息丰富的损失和投影损失，旨在学习稳健的行动识别表示。特别是，信息性损失最大化了功能嵌入的表现力，而每个数据集的投影损失将跨数据集之间的类之间的内在关系。我们验证方法在五个具有挑战性的数据集（Kinetics-400，Kinetics-700，imment-Intim-In-time，Activitynet和Something-something-tose-themething-v2数据集）上验证了我们的方法的有效性。广泛的实验结果表明，我们的方法可以始终如一地提高最新性能。代码和模型已发布。

We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition. We build our method on Transformers for its efficacy. Although we have witnessed great progress for video action recognition in the past decade, it remains challenging yet valuable how to train a single model that can perform well across multiple datasets. Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss, aiming to learn robust representations for action recognition. In particular, the informative loss maximizes the expressiveness of the feature embedding while the projection loss for each dataset mines the intrinsic relations between classes across datasets. We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2 datasets. Extensive experimental results show that our method can consistently improve state-of-the-art performance. Code and models are released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题