来自单眼视频的综合光场视频

论文标题

来自单眼视频的综合光场视频

Synthesizing Light Field Video from Monocular Video

论文作者

Govindarajan, Shrisudhan, Shedligeri, Prasan, Sarah, Mitra, Kaushik

论文摘要

与光场（LF）成像相关的硬件挑战使消费者很难访问其收益，例如捕获后重点和光圈控制中的应用程序。从稀疏（1、2或4）视图中解决了基于学习的技术，可以大大减少对复杂硬件的需求。稀疏视图的LF视频重建构成了一个特殊的挑战，因为购买了训练这些模型的地面真相。因此，我们提出了一种从单眼视频中进行LF视频重建的基于学习的学习算法。我们使用自我监督的几何，光度和时间一致性约束，灵感来自最新的自我监督技术，用于从立体声视频中进行LF视频重建。此外，我们提出了三种与单眼视频输入相关的关键技术。我们提出了一种明确的不合格处理技术，该技术使用来自相邻输入时间范围的信息，鼓励网络在LF框架内分配区域。这对于自我监督的技术至关重要，因为单个输入框架不包含有关分离区域的任何信息。我们还提出了一种自适应的低级表示形式，该表示可以通过为每个输入场景定制表示形式来显着提高性能。最后，我们还提出了一个新颖的改进块，能够使用监督学习来利用可用的LF图像数据，以进一步完善重建质量。我们的定性和定量分析证明了每个提出的构建块的重要性，以及与以前最新的单眼LF重建技术相比的优势结果。我们通过从商用GoPro摄像机获得的单眼视频中重建LF视频来进一步验证我们的算法。

The hardware challenges associated with light-field(LF) imaging has made it difficult for consumers to access its benefits like applications in post-capture focus and aperture control. Learning-based techniques which solve the ill-posed problem of LF reconstruction from sparse (1, 2 or 4) views have significantly reduced the requirement for complex hardware. LF video reconstruction from sparse views poses a special challenge as acquiring ground-truth for training these models is hard. Hence, we propose a self-supervised learning-based algorithm for LF video reconstruction from monocular videos. We use self-supervised geometric, photometric and temporal consistency constraints inspired from a recent self-supervised technique for LF video reconstruction from stereo video. Additionally, we propose three key techniques that are relevant to our monocular video input. We propose an explicit disocclusion handling technique that encourages the network to inpaint disoccluded regions in a LF frame, using information from adjacent input temporal frames. This is crucial for a self-supervised technique as a single input frame does not contain any information about the disoccluded regions. We also propose an adaptive low-rank representation that provides a significant boost in performance by tailoring the representation to each input scene. Finally, we also propose a novel refinement block that is able to exploit the available LF image data using supervised learning to further refine the reconstruction quality. Our qualitative and quantitative analysis demonstrates the significance of each of the proposed building blocks and also the superior results compared to previous state-of-the-art monocular LF reconstruction techniques. We further validate our algorithm by reconstructing LF videos from monocular videos acquired using a commercial GoPro camera.

下载PDF全文

下载文献需遵守相关版权规定

论文标题