大运动下的时空作用检测

论文标题

大运动下的时空作用检测

Spatio-Temporal Action Detection Under Large Motion

论文作者

Singh, Gurkirt, Choutas, Vasileios, Saha, Suman, Yu, Fisher, Van Gool, Luc

论文摘要

当前时空动作管检测的当前方法通常将给定键框的边界框提案扩展到附近帧的3D颞轴和池特征。但是，如果演员的位置或形状通过大型摄像机运动，大型演员形状变形，快速演员的动作等显示出大型的2D运动和可变性，则这种合并将无法积累有意义的时空特征。在这项工作中，我们旨在研究Cuboid感知特征聚集在大型动作下的动作检测中的性能。此外，我们建议通过跟踪参与者并沿各个轨道执行时间特征聚合来增强演员特征表示。我们在各种固定时间尺度的动作管/轨道盒之间使用相交联合（IOU）定义了演员运动。随着时间的流逝，具有较大运动的动作会导致较低的IOU，并且较慢的动作将保持更高的IOU。我们发现，跟踪感知特征的聚合始终取得了巨大的动作检测性能，尤其是与Cuboid感知的基线相比，在大型运动下进行的动作。结果，我们还报告了大规模多运动数据集的最先进。该代码可在https://github.com/gurkirt/actiontrackdetectron上找到。

Current methods for spatiotemporal action tube detection often extend a bounding box proposal at a given keyframe into a 3D temporal cuboid and pool features from nearby frames. However, such pooling fails to accumulate meaningful spatiotemporal features if the position or shape of the actor shows large 2D motion and variability through the frames, due to large camera motion, large actor shape deformation, fast actor action and so on. In this work, we aim to study the performance of cuboid-aware feature aggregation in action detection under large action. Further, we propose to enhance actor feature representation under large motion by tracking actors and performing temporal feature aggregation along the respective tracks. We define the actor motion with intersection-over-union (IoU) between the boxes of action tubes/tracks at various fixed time scales. The action having a large motion would result in lower IoU over time, and slower actions would maintain higher IoU. We find that track-aware feature aggregation consistently achieves a large improvement in action detection performance, especially for actions under large motion compared to the cuboid-aware baseline. As a result, we also report state-of-the-art on the large-scale MultiSports dataset. The Code is available at https://github.com/gurkirt/ActionTrackDetectron.

下载PDF全文

下载文献需遵守相关版权规定

论文标题