端到端弱监督的单阶段多个3D手网格重建来自单个RGB图像

论文标题

端到端弱监督的单阶段多个3D手网格重建来自单个RGB图像

End-to-end Weakly-supervised Single-stage Multiple 3D Hand Mesh Reconstruction from a Single RGB Image

论文作者

Ren, Jinwei, Zhu, Jianke, Zhang, Jialiang

论文摘要

在本文中，我们考虑了从单个2D图像中同时找到和恢复多手的具有挑战性的任务。先前的研究要么集中于单手重建，要么以多阶段的方式解决此问题。此外，常规的两阶段管道首先检测到手部区域，然后估计每个裁剪贴片的3D手姿势。为了减少预处理和特征提取中的计算冗余，我们首次提出了一条简洁但有效的单级管道，用于多手重建。具体而言，我们设计了一个多头自动编码器结构，每个头网络都共享相同的功能图并分别输出手动中心，姿势和纹理。此外，我们采用了一个弱监督的计划来减轻昂贵的3D现实世界数据注释的负担。为此，我们提出了一系列通过舞台训练方案优化的损失，其中根据公开可用的单手数据集生成具有2D注释的多手数据集。为了进一步提高弱监督模型的准确性，我们在单个手和多个手设置中采用了几个功能一致性约束。具体而言，从本地功能估算的每只手的关键点应与全局功能预测的重新投影点一致。在包括Freihand，HO3D，Interhand 26M和RHD在内的公共基准测试的广泛实验表明，我们的方法在弱不足的监督和完全监督的举止中优于基于最先进的模型方法。代码和模型可在{https://github.com/zijinxuxu/smhr}上获得。

In this paper, we consider the challenging task of simultaneously locating and recovering multiple hands from a single 2D image. Previous studies either focus on single hand reconstruction or solve this problem in a multi-stage way. Moreover, the conventional two-stage pipeline firstly detects hand areas, and then estimates 3D hand pose from each cropped patch. To reduce the computational redundancy in preprocessing and feature extraction, for the first time, we propose a concise but efficient single-stage pipeline for multi-hand reconstruction. Specifically, we design a multi-head auto-encoder structure, where each head network shares the same feature map and outputs the hand center, pose and texture, respectively. Besides, we adopt a weakly-supervised scheme to alleviate the burden of expensive 3D real-world data annotations. To this end, we propose a series of losses optimized by a stage-wise training scheme, where a multi-hand dataset with 2D annotations is generated based on the publicly available single hand datasets. In order to further improve the accuracy of the weakly supervised model, we adopt several feature consistency constraints in both single and multiple hand settings. Specifically, the keypoints of each hand estimated from local features should be consistent with the re-projected points predicted from global features. Extensive experiments on public benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our method outperforms the state-of-the-art model-based methods in both weakly-supervised and fully-supervised manners. The code and models are available at {https://github.com/zijinxuxu/SMHR}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题