论文标题
整体3D人物和场景网格估算从单视图图像
Holistic 3D Human and Scene Mesh Estimation from Single View Images
论文作者
论文摘要
3D世界限制了人体姿势,人体姿势传达了有关周围物体的信息。的确,从放置在室内场景中的一个人的形象来看,我们作为人类擅长通过对物理定律的知识以及对合理的物体和人类姿势的事先感知来解决人类姿势和房间布局的歧义。但是,很少有计算机视觉模型充分利用这一事实。在这项工作中,我们提出了一个端到端的可训练模型,该模型从单个RGB图像中感知3D场景,估计相机姿势和房间布局,并重建人体和物体网格。通过对估计的各个方面施加一组全面而复杂的损失,我们表明我们的模型表现优于现有的人体网格方法和室内场景重建方法。据我们所知,这是第一个在网格级别输出对象和人类预测的模型,并对现场和人类姿势进行关节优化。
The 3D world limits the human body pose and the human body pose conveys information about the surrounding objects. Indeed, from a single image of a person placed in an indoor scene, we as humans are adept at resolving ambiguities of the human pose and room layout through our knowledge of the physical laws and prior perception of the plausible object and human poses. However, few computer vision models fully leverage this fact. In this work, we propose an end-to-end trainable model that perceives the 3D scene from a single RGB image, estimates the camera pose and the room layout, and reconstructs both human body and object meshes. By imposing a set of comprehensive and sophisticated losses on all aspects of the estimations, we show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods. To the best of our knowledge, this is the first model that outputs both object and human predictions at the mesh level, and performs joint optimization on the scene and human poses.