跟踪和重建手对象从野外点云序列的交互

论文标题

跟踪和重建手对象从野外点云序列的交互

Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild

论文作者

Chen, Jiayi, Yan, Mi, Zhang, Jiazhao, Xu, Yinzhen, Li, Xiaolong, Weng, Yijia, Yi, Li, Song, Shuran, Wang, He

论文摘要

在这项工作中，我们解决了共同跟踪手对象姿势并从野外深度点云序列重建形状的具有挑战性的任务，鉴于在框架0处的初始姿势。我们首次提出了一个基于点云的手动关节跟踪网络HandTrackNet，以估算镜头间接头运动。我们的HandTrackNet提出了一个新型的手姿势构成典型化模块，以简化跟踪任务，从而产生准确且坚固的手动关节跟踪。然后，我们的管道通过将预测的手关节转换为基于模板的参数手模型mano来重建全手。对于对象跟踪，我们设计了一个简单而有效的模块，该模块从第一帧估算对象SDF并执行基于优化的跟踪。最后，采用联合优化步骤进行联合手和物体推理，从而减轻了闭塞引起的歧义并进一步完善了手姿势。在训练过程中，整个管道仅看到纯粹的合成数据，这些数据与足够的变化和深度模拟合成，以易于概括。整个管道与概括差距有关，因此可以直接转移到真实的野外数据。我们在两个真实的手对象交互数据集上评估我们的方法，例如HO3D和DEXYCB，没有任何填充。我们的实验表明，所提出的方法显着优于先前最新的基于深度的手和对象姿势估计和跟踪方法，以9 fps的帧速率运行。

In this work, we tackle the challenging task of jointly tracking hand object pose and reconstructing their shapes from depth point cloud sequences in the wild, given the initial poses at frame 0. We for the first time propose a point cloud based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion. Our HandTrackNet proposes a novel hand pose canonicalization module to ease the tracking task, yielding accurate and robust hand joint tracking. Our pipeline then reconstructs the full hand via converting the predicted hand joints into a template-based parametric hand model MANO. For object tracking, we devise a simple yet effective module that estimates the object SDF from the first frame and performs optimization-based tracking. Finally, a joint optimization step is adopted to perform joint hand and object reasoning, which alleviates the occlusion-induced ambiguity and further refines the hand pose. During training, the whole pipeline only sees purely synthetic data, which are synthesized with sufficient variations and by depth simulation for the ease of generalization. The whole pipeline is pertinent to the generalization gaps and thus directly transferable to real in-the-wild data. We evaluate our method on two real hand object interaction datasets, e.g. HO3D and DexYCB, without any finetuning. Our experiments demonstrate that the proposed method significantly outperforms the previous state-of-the-art depth-based hand and object pose estimation and tracking methods, running at a frame rate of 9 FPS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题