论文标题
Assembly101:一个大规模的多视频视频数据集,用于了解程序活动
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
论文作者
论文摘要
Assembly101是一个新的程序活动数据集,其中包含4321个人组装和拆卸101个“接管”玩具车的视频。参与者没有固定的说明工作,并且序列具有动作顺序,错误和更正的丰富自然变化。 Assembly101是第一个多视图动作数据集,具有同时静态(8)和Egentric(4)记录。序列用超过100k的粗粒和1m细粒作用段和18m 3D手姿势注释。我们基于三个动作理解任务进行基准测试:识别,预期和时间细分。此外,我们提出了一个发现错误的新任务。独特的记录格式和丰富的注释集使我们能够研究对新玩具,跨视图转移,长尾分布以及姿势与外观的概括。我们设想汇编101将成为调查各种活动理解问题的新挑战。
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.