论文标题
竖琴:单眼RGB视频的个性化手工重建
HARP: Personalized Hand Reconstruction from a Monocular RGB Video
论文作者
论文摘要
我们介绍了竖琴(手工重建和个性化),这是一种个性化的手动制作方法,该方法采用了一个简短的单眼RGB视频,其中包括人类手的输入,并重建了忠实的手动头像,表现出高表面的外观和几何形状。与神经隐式表示的主要趋势相反,HARP模拟了具有基于网格的参数手模型的手,顶点位移图,正常地图和没有任何神经成分的反照率。正如我们的实验所证实的那样,我们表示的明确性质可以实现一种真正可扩展,健壮和有效的方法来掌握化身的创造方法。 HARP是通过手持手机捕获的短序列通过梯度下降来优化的,可以直接用于具有实时渲染能力的AR/VR应用程序中。为了实现这一目标,我们仔细设计并实施了一个可观的可分化渲染方案,该方案对高度发音和自我阴影定期定期以手动运动序列以及具有挑战性的照明条件为生。它还概括为看不见的姿势和新颖的观点,从而产生了手动动画的照片真实的效果图。此外,在具有挑战性的观点中,可学习的竖琴表示可用于改善3D手姿势估计质量。竖琴的关键优势通过对外观重建,新颖的姿势和新型姿势合成以及3D手姿势改进的深入分析得到了验证。这是一种可AR/VR准备的个性化手表示,显示出卓越的忠诚度和可扩展性。
We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components. As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability. To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.