论文标题

外观是否可以识别?

Is Appearance Free Action Recognition Possible?

论文作者

Ilic, Filip, Pock, Thomas, Wildes, Richard P.

论文摘要

直觉可能表明,运动和动态信息是基于视频的动作识别的关键。相比之下,有证据表明,最先进的深入学习视频理解体系结构偏向单帧可用的静态信息。目前,缺少用于隔离视频中动态信息影响的方法和相应的数据集。他们的缺席使得很难理解当代体系结构如何利用动态和静态信息。我们以新颖的外观免费数据集(AFD)做出回应,以进行动作识别。 AFD缺乏与单个帧中的动作识别有关的静态信息。动力学的建模对于解决任务是必要的,因为仅通过考虑时间维度才能显而易见动作。我们评估了AFD上的11种当代行动识别体系结构及其相关的RGB视频。我们的结果表明,与RGB相比,AFD上所有体系结构的性能均显着下降。我们还对人类进行了免费研究,该研究表明他们在AFD和RGB上的识别准确性非常相似,并且比AFD评估的体系结构要好得多。我们的结果激发了一种新颖的体系结构,在当代设计中,在AFD和RGB上的最佳性能中恢复了光流的明确恢复。

Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源