论文标题
立体视频超分辨率的新数据集和变压器
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
论文作者
论文摘要
立体视频超分辨率(SVSR)旨在通过重建高分辨率视频来增强低分辨率视频的空间分辨率。 SVSR面临的主要挑战是保留立体声抗性和时间一致性,没有这些观众可能会遇到3D疲劳。关于立体图像超分辨率有几部著名的作品,但是关于立体视频超级分辨率的研究很少。在本文中,我们为SVSR(即Trans-Svsr)提出了一种基于变压器的新型模型。 Trans-SVSR包括两个关键的新颖组成部分:时空卷积自我发项层和一个基于光流的馈电层,该层发现了不同视频框架之间的相关性并使特征对齐。使用跨视图信息来考虑重大差异的视差注意机制(PAM)用于融合立体声视图。由于缺乏适合SVSR任务的基准数据集,我们收集了一个新的立体视频数据集SVSR-Set,其中包含71个完整的高清(HD)立体声视频,该视频使用专业立体相机捕获。在收集的数据集上进行的广泛实验以及其他两个数据集表明,与最先进的方法相比,Trans-SVSR可以实现竞争性能。项目代码和其他结果可在https://github.com/h-deep/trans-svsr/上获得
Stereo video super-resolution (SVSR) aims to enhance the spatial resolution of the low-resolution video by reconstructing the high-resolution video. The key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue. There are several notable works on stereoscopic image super-resolution, but there is little research on stereo video super-resolution. In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises two key novel components: a spatio-temporal convolutional self-attention layer and an optical flow-based feed-forward layer that discovers the correlation across different video frames and aligns the features. The parallax attention mechanism (PAM) that uses the cross-view information to consider the significant disparities is used to fuse the stereo views. Due to the lack of a benchmark dataset suitable for the SVSR task, we collected a new stereoscopic video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos captured using a professional stereo camera. Extensive experiments on the collected dataset, along with two other datasets, demonstrate that the Trans-SVSR can achieve competitive performance compared to the state-of-the-art methods. Project code and additional results are available at https://github.com/H-deep/Trans-SVSR/