Assembly101：一个大规模的多视频视频数据集，用于了解程序活动

论文标题

Assembly101：一个大规模的多视频视频数据集，用于了解程序活动

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

论文作者

Sener, Fadime, Chatterjee, Dibyadip, Shelepov, Daniel, He, Kun, Singhania, Dipika, Wang, Robert, Yao, Angela

论文摘要

Assembly101是一个新的程序活动数据集，其中包含4321个人组装和拆卸101个“接管”玩具车的视频。参与者没有固定的说明工作，并且序列具有动作顺序，错误和更正的丰富自然变化。 Assembly101是第一个多视图动作数据集，具有同时静态（8）和Egentric（4）记录。序列用超过100k的粗粒和1m细粒作用段和18m 3D手姿势注释。我们基于三个动作理解任务进行基准测试：识别，预期和时间细分。此外，我们提出了一个发现错误的新任务。独特的记录格式和丰富的注释集使我们能够研究对新玩具，跨视图转移，长尾分布以及姿势与外观的概括。我们设想汇编101将成为调查各种活动理解问题的新挑战。

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题