论文标题
野外动态3D人体重建的时空束调整
Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in the Wild
论文作者
论文摘要
捆绑调整共同优化了相机内在和外部设备以及3D点三角测量,以重建静态场景。然而,三角构约束对于在多个非同步视频中捕获的移动点和束调整并非旨在估计摄像机之间的时间对齐。我们提出了一个时空束调整框架,该框架共同优化了四个耦合的子问题:估计摄像机内在和外部设备,三角凝结3D点以及相机之间的子帧时间对齐,并计算3D动态点的3D轨迹。我们联合优化的关键是在重建管道中仔细整合基于物理的运动先验,并在人类受试者的大型运动捕获语料库中进行了验证。我们设计了一种增量的重建和对齐算法,以严格在时空束调整期间严格执行先前的运动。通过鸿沟和征服方案,该算法进一步使该算法更加有效,同时仍保持高精度。我们将该算法应用于人体的3D运动轨迹中,在野外的多个未校准和非同步摄像机捕获的动态事件中。为了使重建在视觉上更容易解释,我们将统计3D人体模型拟合到异步视频流。与基线相关联,拟合时空束束调整程序与基线相关。由于视频与子帧精度对齐,因此我们以比输入视频更高的时间分辨率重建3D运动。
Bundle adjustment jointly optimizes camera intrinsics and extrinsics and 3D point triangulation to reconstruct a static scene. The triangulation constraint, however, is invalid for moving points captured in multiple unsynchronized videos and bundle adjustment is not designed to estimate the temporal alignment between cameras. We present a spatiotemporal bundle adjustment framework that jointly optimizes four coupled sub-problems: estimating camera intrinsics and extrinsics, triangulating static 3D points, as well as sub-frame temporal alignment between cameras and computing 3D trajectories of dynamic points. Key to our joint optimization is the careful integration of physics-based motion priors within the reconstruction pipeline, validated on a large motion capture corpus of human subjects. We devise an incremental reconstruction and alignment algorithm to strictly enforce the motion prior during the spatiotemporal bundle adjustment. This algorithm is further made more efficient by a divide and conquer scheme while still maintaining high accuracy. We apply this algorithm to reconstruct 3D motion trajectories of human bodies in dynamic events captured by multiple uncalibrated and unsynchronized video cameras in the wild. To make the reconstruction visually more interpretable, we fit a statistical 3D human body model to the asynchronous video streams.Compared to the baseline, the fitting significantly benefits from the proposed spatiotemporal bundle adjustment procedure. Because the videos are aligned with sub-frame precision, we reconstruct 3D motion at much higher temporal resolution than the input videos.