论文标题
STAU:视频预测及以后的时空感知单元
STAU: A SpatioTemporal-Aware Unit for Video Prediction and Beyond
论文作者
论文摘要
视频预测旨在通过建模视频中复杂的时空动态来预测未来的帧。但是,大多数现有方法仅以独立的方式对视频的时间信息和空间信息进行建模,但尚未完全探讨这两个术语之间的相关性。在本文中,我们提出了一个时空感知单元(STAU),以供视频预测及以后探索视频中的显着时空相关性。一方面,从空间状态中学到了运动感知的注意力,以帮助汇总时间域中的颞状态。另一方面,从颞状态中学到了出现意识的注意力,以帮助在空间域中汇总空间状态。这样,时间信息和空间信息就可以在两个领域相互了解,在此期间,时空接受场也可以大大扩展,以实现更可靠的时空建模。实验不仅是在传统的视频预测任务上进行的,而且是视频预测以外的其他任务,包括早期动作识别和对象检测任务。实验结果表明,就性能和计算效率而言,我们的STAU可以在所有任务上胜过其他方法。
Video prediction aims to predict future frames by modeling the complex spatiotemporal dynamics in videos. However, most of the existing methods only model the temporal information and the spatial information for videos in an independent manner but haven't fully explored the correlations between both terms. In this paper, we propose a SpatioTemporal-Aware Unit (STAU) for video prediction and beyond by exploring the significant spatiotemporal correlations in videos. On the one hand, the motion-aware attention weights are learned from the spatial states to help aggregate the temporal states in the temporal domain. On the other hand, the appearance-aware attention weights are learned from the temporal states to help aggregate the spatial states in the spatial domain. In this way, the temporal information and the spatial information can be greatly aware of each other in both domains, during which, the spatiotemporal receptive field can also be greatly broadened for more reliable spatiotemporal modeling. Experiments are not only conducted on traditional video prediction tasks but also other tasks beyond video prediction, including the early action recognition and object detection tasks. Experimental results show that our STAU can outperform other methods on all tasks in terms of performance and computation efficiency.