MM-Seal：多人多人时空动作本地化的大规模视频数据集

论文标题

MM-Seal：多人多人时空动作本地化的大规模视频数据集

MM-SEAL: A Large-scale Video Dataset of Multi-person Multi-grained Spatio-temporally Action Localization

论文作者

Chen, Shimin, Li, Wei, Chen, Chen, Gu, Jianyang, Chu, Jiaming, Tao, Xunqiang, Guo, Yandong

论文摘要

在本文中，我们介绍了一种新颖的大型视频数据集，称为MM-Seal，用于人类日常生活中的多人多人时空动作定位。我们是第一个为多人时空复杂活动定位提出新的基准的新基准，在该时空中，复杂的语义和较长的持续时间为本地化任务带来了新的挑战。我们观察到有限的原子作用可以合并为许多复杂的活动。 MM-SEAL提供原子行动和复杂的活动注释，产生111.7k原子动作，涵盖172个动作类别和17.7k的复杂活动，涵盖了200个活动类别。我们探讨了原子行动与复杂活动之间的关系，发现原子作用特征可以改善复杂的活动定位性能。另外，我们提出了一个新网络，该网络同时生成时间提案和标签，称为更快的tad。最后，我们的评估表明，在MM-Seal上预估计的视觉特征可以提高其他动作定位基准的性能。我们将在发表论文后发布数据集和项目代码。

In this paper, we introduce a novel large-scale video dataset dubbed MM-SEAL for multi-person multi-grained spatio-temporal action localization among human daily life. We are the first to propose a new benchmark for multi-person spatio-temporal complex activity localization, where complex semantic and long duration bring new challenges to localization tasks. We observe that limited atomic actions can be combined into many complex activities. MM-SEAL provides both atomic action and complex activity annotations, producing 111.7k atomic actions spanning 172 action categories and 17.7k complex activities spanning 200 activity categories. We explore the relationship between atomic actions and complex activities, finding that atomic action features can improve the complex activity localization performance. Also, we propose a new network which generates temporal proposals and labels simultaneously, termed Faster-TAD. Finally, our evaluations show that visual features pretrained on MM-SEAL can improve the performance on other action localization benchmarks. We will release the dataset and the project code upon publication of the paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题