驾驶模拟中的光真主：将生成的对抗图像合成与渲染混合

论文标题

驾驶模拟中的光真主：将生成的对抗图像合成与渲染混合

Photorealism in Driving Simulations: Blending Generative Adversarial Image Synthesis with Rendering

论文作者

Yurtsever, Ekim, Yang, Dongfang, Koc, Ibrahim Mert, Redmill, Keith A.

论文摘要

驾驶模拟器在开发和测试新的智能车辆系统中起着重要作用。模拟的视觉保真度对于构建基于视觉的算法和进行人体驱动器实验至关重要。低视觉保真度破坏了对人类驾驶实验的沉浸。传统的计算机图形管道使用详细的3D模型，网格，纹理和渲染引擎来从3D场景中生成2D图像。这些过程是劳动密集型的，并且不会产生逼真的图像。在这里，我们介绍了混合生成神经图形管道，以改善驾驶模拟的视觉保真度。在给定3D场景的情况下，我们仅部分渲染感兴趣的重要对象，例如车辆，并使用生成的对抗过程来综合背景和其余图像。为此，我们提出了一种新颖的图像形成策略，以从3D风景中形成2D语义图像，该图像由简单的对象模型组成而无需纹理。然后将这些语义图像转换为具有在现实世界驾驶场景中训练的最先进的生成对抗网络（GAN），将其转换为逼真的RGB图像。这用随机生成但逼真的表面取代重复性。最后，将部分渲染和GAN合成的图像与混合gan混合在一起。我们表明，使用该方法生成的图像的光真相更像是与传统方法相比，与现实世界中的驾驶数据集（如CityScapes和Kitti）更相似。该比较是使用语义保留分析和特征成立距离（FID）测量进行的。

Driving simulators play a large role in developing and testing new intelligent vehicle systems. The visual fidelity of the simulation is critical for building vision-based algorithms and conducting human driver experiments. Low visual fidelity breaks immersion for human-in-the-loop driving experiments. Conventional computer graphics pipelines use detailed 3D models, meshes, textures, and rendering engines to generate 2D images from 3D scenes. These processes are labor-intensive, and they do not generate photorealistic imagery. Here we introduce a hybrid generative neural graphics pipeline for improving the visual fidelity of driving simulations. Given a 3D scene, we partially render only important objects of interest, such as vehicles, and use generative adversarial processes to synthesize the background and the rest of the image. To this end, we propose a novel image formation strategy to form 2D semantic images from 3D scenery consisting of simple object models without textures. These semantic images are then converted into photorealistic RGB images with a state-of-the-art Generative Adversarial Network (GAN) trained on real-world driving scenes. This replaces repetitiveness with randomly generated but photorealistic surfaces. Finally, the partially-rendered and GAN synthesized images are blended with a blending GAN. We show that the photorealism of images generated with the proposed method is more similar to real-world driving datasets such as Cityscapes and KITTI than conventional approaches. This comparison is made using semantic retention analysis and Frechet Inception Distance (FID) measurements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题