论文标题
跨量表和跨维度:使用深层内部学习的时间超分辨率
Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning
论文作者
论文摘要
当使用低型摄像头记录一个非常快速的动态事件时,由此产生的视频会遭受严重的运动模糊(由于曝光时间)和运动别名(由于时间的采样率较低而导致)。真正的时间超分辨率(TSR)不仅仅是时间交流(增加帧率)。它还可以恢复超出输入视频的时间奈奎斯特极限的新的高时间频率,从而解决了运动框架和动作升级效果,即暂时框架插值(也许如此复杂)无法撤消。在本文中,我们为真正的TSR提出了一种“深层内部学习”方法。我们在直接从低频输入视频中提取的示例训练特定于视频的CNN。我们的方法利用了视频的不同时空尺度内部和跨越单个视频序列中小型时空贴片的强烈复发。我们进一步观察到(首次),小时的时空贴片也会复发视频序列的跨维度 - 即通过交换空间和时间维度。特别是,视频帧的较高空间分辨率提供了有关如何增加视频时间分辨率的有力示例。这种内部视频特定的示例会产生强大的自我审视,而不需要数据,而是输入视频本身。这会导致复杂视频的零射击时间SR,从而消除了运动模糊和运动别名,超过了先前在外部视频数据集中训练的监督方法。
When a very fast dynamic event is recorded with a low-framerate camera, the resulting video suffers from severe motion blur (due to exposure time) and motion aliasing (due to low sampling rate in time). True Temporal Super-Resolution (TSR) is more than just Temporal-Interpolation (increasing framerate). It can also recover new high temporal frequencies beyond the temporal Nyquist limit of the input video, thus resolving both motion-blur and motion-aliasing effects that temporal frame interpolation (as sophisticated as it maybe) cannot undo. In this paper we propose a "Deep Internal Learning" approach for true TSR. We train a video-specific CNN on examples extracted directly from the low-framerate input video. Our method exploits the strong recurrence of small space-time patches inside a single video sequence, both within and across different spatio-temporal scales of the video. We further observe (for the first time) that small space-time patches recur also across-dimensions of the video sequence - i.e., by swapping the spatial and temporal dimensions. In particular, the higher spatial resolution of video frames provides strong examples as to how to increase the temporal resolution of that video. Such internal video-specific examples give rise to strong self-supervision, requiring no data but the input video itself. This results in Zero-Shot Temporal-SR of complex videos, which removes both motion blur and motion aliasing, outperforming previous supervised methods trained on external video datasets.