进行性运动上下文精炼网络，用于有效的视频框架插值

论文标题

进行性运动上下文精炼网络，用于有效的视频框架插值

Progressive Motion Context Refine Network for Efficient Video Frame Interpolation

论文作者

Kong, Lingtong, Liu, Jinfeng, Yang, Jie

论文摘要

最近，基于流动的框架插值方法通过首先建模目标和输入帧之间的光流，然后建立目标框架生成的合成网络，从而取得了巨大的成功。但是，上面的架构体系结构可能会导致较大的模型尺寸和推理延迟，从而阻碍了移动和实时应用程序。为了解决这个问题，我们提出了一个新型的渐进运动上下文精炼网络（PMCRNET），以共同预测运动场和图像上下文，以提高效率。与其他直接合成目标框架从深处合成目标框架的其他不同之处，我们通过从相邻输入框架借用现有纹理来简化框架插值任务，这意味着我们PMCRNET的每个金字塔中的解码器都只需要更新易于更轻松的中间光学流动流，咬合Merge Merge MASK和图像残基。此外，我们引入了新的退火多尺度重建损失，以更好地指导这种有效的PMCRNET的学习过程。多个基准测试的实验表明，提出的方法不仅可以实现有利的定量和定性结果，而且还大大减少了当前模型大小和运行时间。

Recently, flow-based frame interpolation methods have achieved great success by first modeling optical flow between target and input frames, and then building synthesis network for target frame generation. However, above cascaded architecture can lead to large model size and inference delay, hindering them from mobile and real-time applications. To solve this problem, we propose a novel Progressive Motion Context Refine Network (PMCRNet) to predict motion fields and image context jointly for higher efficiency. Different from others that directly synthesize target frame from deep feature, we explore to simplify frame interpolation task by borrowing existing texture from adjacent input frames, which means that decoder in each pyramid level of our PMCRNet only needs to update easier intermediate optical flow, occlusion merge mask and image residual. Moreover, we introduce a new annealed multi-scale reconstruction loss to better guide the learning process of this efficient PMCRNet. Experiments on multiple benchmarks show that proposed approaches not only achieve favorable quantitative and qualitative results but also reduces current model size and running time significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题