论文标题

Assembly101:一个大规模的多视频视频数据集,用于了解程序活动

Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities

论文作者

Sener, Fadime, Chatterjee, Dibyadip, Shelepov, Daniel, He, Kun, Singhania, Dipika, Wang, Robert, Yao, Angela

论文摘要

Assembly101是一个新的程序活动数据集,其中包含4321个人组装和拆卸101个“接管”玩具车的视频。参与者没有固定的说明工作,并且序列具有动作顺序,错误和更正的丰富自然变化。 Assembly101是第一个多视图动作数据集,具有同时静态(8)和Egentric(4)记录。序列用超过100k的粗粒和1m细粒作用段和18m 3D手姿势注释。我们基于三个动作理解任务进行基准测试:识别,预期和时间细分。此外,我们提出了一个发现错误的新任务。独特的记录格式和丰富的注释集使我们能够研究对新玩具,跨视图转移,长尾分布以及姿势与外观的概括。我们设想汇编101将成为调查各种活动理解问题的新挑战。

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源