探索高保真和时间一致性视频预测的时空多频分析

论文标题

探索高保真和时间一致性视频预测的时空多频分析

Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction

论文作者

Jin, Beibei, Hu, Yu, Tang, Qiankun, Niu, Jingyu, Shi, Zhiping, Han, Yinhe, Li, Xiaowei

论文摘要

视频预测是一项按像素密集的预测任务，可以根据过去的帧推断未来的帧。缺少外观细节和运动模糊仍然是当前预测模型的两个主要问题，这导致图像失真和时间不一致。在本文中，我们指出了探索多频分析以解决这两个问题的必要性。受到人类视觉系统（HVS）的频带分解特征的启发，我们提出了一个基于多层小波分析的视频预测网络，以统一的方式处理空间和时间信息。具体而言，多级空间离散小波变换将每个视频框架分解为具有多个频率的各向异性子兰，有助于丰富结构信息并保留细节。另一方面，按时间轴操作的多级时间离散小波变换将帧序列分解为不同频率的子频段组，以在固定帧速率下准确捕获多频运动。对不同数据集的广泛实验表明，我们的模型对忠诚度和时间一致性的显着改善，对最先进的作品。

Video prediction is a pixel-wise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multi-frequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multi-level wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multi-level spatial discrete wavelet transform decomposes each video frame into anisotropic sub-bands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multi-level temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into sub-band groups of different frequencies to accurately capture multi-frequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over state-of-the-art works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题