多视图一致的生成对抗网络，用于3D感知图像综合

论文标题

多视图一致的生成对抗网络，用于3D感知图像综合

Multi-View Consistent Generative Adversarial Networks for 3D-aware Image Synthesis

论文作者

Zhang, Xuanmeng, Zheng, Zhedong, Gao, Daiheng, Zhang, Bang, Pan, Pan, Yang, Yi

论文摘要

3D感知图像合成旨在通过学习3D表示从多个视图中生成对象的图像。但是，一个关键的挑战仍然存在：现有方法缺乏几何约束，因此通常无法生成多视图一致的图像。为了应对这一挑战，我们建议使用几何限制的高质量3D感知图像合成的多视图一致的生成对抗网络（MVCGAN）。通过利用生成图像的基础3D几何信息，即深度和摄像机转换矩阵，我们明确地在视图之间建立了立体声对应，以执行多视图关节优化。特别是，我们强制执行成对视图对之间的光度一致性，并将立体声混合机构集成到训练过程中，鼓励模型推理正确的3D形状。此外，我们设计了一个两阶段的训练策略，具有功能级多视接头优化，以提高图像质量。在三个数据集上进行的广泛实验表明，MVCGAN实现了3D感知图像合成的最新性能。

3D-aware image synthesis aims to generate images of objects from multiple views by learning a 3D representation. However, one key challenge remains: existing approaches lack geometry constraints, hence usually fail to generate multi-view consistent images. To address this challenge, we propose Multi-View Consistent Generative Adversarial Networks (MVCGAN) for high-quality 3D-aware image synthesis with geometry constraints. By leveraging the underlying 3D geometry information of generated images, i.e., depth and camera transformation matrix, we explicitly establish stereo correspondence between views to perform multi-view joint optimization. In particular, we enforce the photometric consistency between pairs of views and integrate a stereo mixup mechanism into the training process, encouraging the model to reason about the correct 3D shape. Besides, we design a two-stage training strategy with feature-level multi-view joint optimization to improve the image quality. Extensive experiments on three datasets demonstrate that MVCGAN achieves the state-of-the-art performance for 3D-aware image synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题