通过有效的测试时间培训，全球一致的视频深度和姿势估计

论文标题

通过有效的测试时间培训，全球一致的视频深度和姿势估计

Globally Consistent Video Depth and Pose Estimation with Efficient Test-Time Training

论文作者

Lee, Yao-Chih, Tseng, Kuan-Wei, Chen, Guan-Sheng, Chen, Chu-Song

论文摘要

密集的深度和姿势估计是各种视频应用的重要先决条件。传统解决方案遭受稀疏特征跟踪的鲁棒性和视频中相机基线不足。因此，最近的方法在估计密集深度之前利用基于学习的光流和深度。但是，以前的作品需要大量的计算时间或产量亚最佳深度结果。我们提出了GCVD，这是本文中从运动（SFM）中基于学习的视频结构的全球一致方法。 GCVD将紧凑型姿势图集成到基于CNN的优化中，以从有效的密钥帧选择机制中实现全球一致的估计。它可以通过流动引导的密钥帧和完善的深度提高基于学习的方法的鲁棒性。实验结果表明，GCVD在深度和姿势估计上都优于最先进的方法。此外，运行时实验表明，它在提供全球一致性的短期和长期视频中都提供了强大的效率。

Dense depth and pose estimation is a vital prerequisite for various video applications. Traditional solutions suffer from the robustness of sparse feature tracking and insufficient camera baselines in videos. Therefore, recent methods utilize learning-based optical flow and depth prior to estimate dense depth. However, previous works require heavy computation time or yield sub-optimal depth results. We present GCVD, a globally consistent method for learning-based video structure from motion (SfM) in this paper. GCVD integrates a compact pose graph into the CNN-based optimization to achieve globally consistent estimation from an effective keyframe selection mechanism. It can improve the robustness of learning-based methods with flow-guided keyframes and well-established depth prior. Experimental results show that GCVD outperforms the state-of-the-art methods on both depth and pose estimation. Besides, the runtime experiments reveal that it provides strong efficiency in both short- and long-term videos with global consistency provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题