ViewFormer：使用变压器从几个图像中渲染的无NERF神经渲染

论文标题

ViewFormer：使用变压器从几个图像中渲染的无NERF神经渲染

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

论文作者

Kulhánek, Jonáš, Derner, Erik, Sattler, Torsten, Babuška, Robert

论文摘要

新型视图合成是一个长期存在的问题。在这项工作中，我们考虑了问题的一种变体，在这种变体中，只有几个上下文视图稀疏地涵盖了场景或对象。目标是预测现场的新观点，这需要学习先验。目前的技术状态基于神经辐射场（NERF），在获得令人印象深刻的结果的同时，这些方法需要长期的训练时间，因为它们需要通过每个图像的神经网络评估数百万个3D点样品。我们提出了一种仅限2D方法，该方法将多个上下文视图映射，并在神经网络的单个通过中映射到新图像。我们的模型使用由密码手册和变压器模型组成的两阶段体系结构。该密码手册用于将单个图像嵌入较小的潜在空间中，而变压器在此更紧凑的空间中求解了视图综合任务。为了有效地训练我们的模型，我们引入了一种新颖的分支注意机制，该机制使我们不仅可以将相同的模型用于神经渲染，还可以用于摄像头姿势估计。实际场景上的实验结果表明，与基于NERF的方法相比，我们的方法具有竞争力，而不是在3D中明确推理，并且训练速度更快。

Novel view synthesis is a long-standing problem. In this work, we consider a variant of the problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. The current state of the art is based on Neural Radiance Field (NeRF), and while achieving impressive results, the methods suffer from long training times as they require evaluating millions of 3D point samples via a neural network for each image. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. Our model uses a two-stage architecture consisting of a codebook and a transformer model. The codebook is used to embed individual images into a smaller latent space, and the transformer solves the view synthesis task in this more compact space. To train our model efficiently, we introduce a novel branching attention mechanism that allows us to use the same model not only for neural rendering but also for camera pose estimation. Experimental results on real-world scenes show that our approach is competitive compared to NeRF-based methods while not reasoning explicitly in 3D, and it is faster to train.

下载PDF全文

下载文献需遵守相关版权规定

论文标题