后期：标签和文本驱动的对象辐射字段

论文标题

后期：标签和文本驱动的对象辐射字段

LaTeRF: Label and Text Driven Object Radiance Fields

论文作者

Mirzaei, Ashkan, Kant, Yash, Kelly, Jonathan, Gilitschenski, Igor

论文摘要

获得3D对象表示对于创建光真实的模拟和收集AR和VR资产很重要。神经领域表明了它们在学习2D图像的场景的连续体积表示方面的有效性，但是从这些模型中获取对象表示，并以较弱的监督仍然是一个开放的挑战。在本文中，我们介绍了Laterf，这是一种从整个场景的2D图像，已知的相机姿势，对象的自然语言描述以及一组对象和非对象点的点标签中提取感兴趣对象的方法。为了忠实地从场景中提取对象，后来在每个3D点上以其他“对象”概率扩展了NERF公式。此外，我们利用预先训练的剪辑模型与我们可区分的对象渲染器相结合的丰富潜在空间来注入对象的封闭部分。我们在合成和现实世界数据集上展示了高保真对象提取，并通过广泛的消融研究证明我们的设计选择是合理的。

Obtaining 3D object representations is important for creating photo-realistic simulations and for collecting AR and VR assets. Neural fields have shown their effectiveness in learning a continuous volumetric representation of a scene from 2D images, but acquiring object representations from these models with weak supervision remains an open challenge. In this paper we introduce LaTeRF, a method for extracting an object of interest from a scene given 2D images of the entire scene, known camera poses, a natural language description of the object, and a set of point-labels of object and non-object points in the input images. To faithfully extract the object from the scene, LaTeRF extends the NeRF formulation with an additional `objectness' probability at each 3D point. Additionally, we leverage the rich latent space of a pre-trained CLIP model combined with our differentiable object renderer, to inpaint the occluded parts of the object. We demonstrate high-fidelity object extraction on both synthetic and real-world datasets and justify our design choices through an extensive ablation study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题