DreamFusion：使用2D扩散的文本到3D

论文标题

DreamFusion：使用2D扩散的文本到3D

DreamFusion: Text-to-3D using 2D Diffusion

论文作者

Poole, Ben, Jain, Ajay, Barron, Jonathan T., Mildenhall, Ben

论文摘要

在文本到图像合成中的最新突破是由在数十亿个图像文本对训练的扩散模型驱动的。将这种方法适应3D合成将需要大规模的标记为3D数据和有效架构的数据集，以使3D数据（当前都不存在）。在这项工作中，我们通过使用预估计的2D文本对图扩散模型来执行文本到3D综合来规避这些局限性。我们引入了基于概率密度蒸馏的损失，该损失使使用2D扩散模型作为优化参数图像发生器的先验。在类似深梦的过程中使用这种损失，我们通过梯度下降优化了随机定位的3D模型（神经辐射场或NERF），以使其从随机角度的2D渲染效果可实现较低的损失。可以从任何角度查看给定文本的结果3D模型，通过任意照明或合成到任何3D环境中。我们的方法不需要3D训练数据，也不需要对图像扩散模型进行修改，这证明了验证的图像扩散模型作为先验的有效性。

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D data and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题