通过几何感知歧视器改善3D感知图像合成

论文标题

通过几何感知歧视器改善3D感知图像合成

Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator

论文作者

Shi, Zifan, Xu, Yinghao, Shen, Yujun, Zhao, Deli, Chen, Qifeng, Yeung, Dit-Yan

论文摘要

3D感知图像综合旨在学习一个生成模型，该模型可以在捕获相当的基础3D形状的同时呈现照片真实的2D图像。一个流行的解决方案是采用生成对抗网络（GAN），并用3D渲染器替换发电机，其中通常使用使用神经辐射场（NERF）渲染的体积。尽管合成质量的提高，现有方法无法获得中等的3D形状。我们认为，考虑到gan的配方中的两个玩家游戏，只有使生成器3D-Aware是不够的。换句话说，取代生成机制仅提供产生3D感知图像的能力，但不能保证，因为发电机的监督主要来自歧视器。为了解决这个问题，我们通过学习几何学意识的歧视者来提高地理，以改善3D感知的gan。具体地说，除了将真实样本与2D图像空间区分开外，还要求歧视器从输入中得出几何信息，然后将其应用于生成器的指导。如此简单而有效的设计有助于学习更准确的3D形状。对各种发电机架构和培训数据集进行了广泛的实验，验证了Geod比最先进的替代方案的优越性。此外，我们的方法被注册为一个通用框架，以便更有能力的歧视器（即，具有超越域分类和几何提取的新型视图合成的第三个任务）可以进一步帮助发电机以更好的多视图一致性。

3D-aware image synthesis aims at learning a generative model that can render photo-realistic 2D images while capturing decent underlying 3D shapes. A popular solution is to adopt the generative adversarial network (GAN) and replace the generator with a 3D renderer, where volume rendering with neural radiance field (NeRF) is commonly used. Despite the advancement of synthesis quality, existing methods fail to obtain moderate 3D shapes. We argue that, considering the two-player game in the formulation of GANs, only making the generator 3D-aware is not enough. In other words, displacing the generative mechanism only offers the capability, but not the guarantee, of producing 3D-aware images, because the supervision of the generator primarily comes from the discriminator. To address this issue, we propose GeoD through learning a geometry-aware discriminator to improve 3D-aware GANs. Concretely, besides differentiating real and fake samples from the 2D image space, the discriminator is additionally asked to derive the geometry information from the inputs, which is then applied as the guidance of the generator. Such a simple yet effective design facilitates learning substantially more accurate 3D shapes. Extensive experiments on various generator architectures and training datasets verify the superiority of GeoD over state-of-the-art alternatives. Moreover, our approach is registered as a general framework such that a more capable discriminator (i.e., with a third task of novel view synthesis beyond domain classification and geometry extraction) can further assist the generator with a better multi-view consistency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题