论文标题
深度非刚性结构,从而从动作:序列到序列翻译的透视图
Deep Non-rigid Structure-from-Motion: A Sequence-to-Sequence Translation Perspective
论文作者
论文摘要
直接从单个2D框架中直接回归非刚性形状和摄像头姿势不适合非刚性结构(NRSFM)问题。此逐帧3D重建管道可忽略NRSFM的固有时空性质,即从输入2D序列重建整个3D序列。在本文中,我们建议从序列到序列翻译的角度对深NRSFM进行建模,其中将输入2D框架序列整体采用以重建变形的3D非刚性形状序列。首先,我们应用一个形状运动预测器来估算单个框架的初始非刚性形状和摄像机运动。然后,我们提出了一个上下文建模模块,以建模相机运动和复杂的非刚性形状。为了解决在深层框架内实施全球结构约束的困难,我们建议通过用多头关注和延迟的正规化剂代替自我表达层来强加空间结构,从而实现端到端的批次训练。在不同数据集(例如Human 36M),CMU MoCap和Distredhand的不同数据集中的实验结果证明了我们框架的优势。
Directly regressing the non-rigid shape and camera pose from the individual 2D frame is ill-suited to the Non-Rigid Structure-from-Motion (NRSfM) problem. This frame-by-frame 3D reconstruction pipeline overlooks the inherent spatial-temporal nature of NRSfM, i.e., reconstructing the whole 3D sequence from the input 2D sequence. In this paper, we propose to model deep NRSfM from a sequence-to-sequence translation perspective, where the input 2D frame sequence is taken as a whole to reconstruct the deforming 3D non-rigid shape sequence. First, we apply a shape-motion predictor to estimate the initial non-rigid shape and camera motion from a single frame. Then we propose a context modeling module to model camera motions and complex non-rigid shapes. To tackle the difficulty in enforcing the global structure constraint within the deep framework, we propose to impose the union-of-subspace structure by replacing the self-expressiveness layer with multi-head attention and delayed regularizers, which enables end-to-end batch-wise training. Experimental results across different datasets such as Human3.6M, CMU Mocap and InterHand prove the superiority of our framework.