对多人2D/3D人类姿势估计的无标记或数据的自学

论文标题

对多人2D/3D人类姿势估计的无标记或数据的自学

Self-supervision on Unlabelled OR Data for Multi-person 2D/3D Human Pose Estimation

论文作者

Srivastav, Vinkle, Gangi, Afshin, Padoy, Nicolas

论文摘要

需要2D/3D的人姿势估计来开发手术室可以分析和支持临床活动的新型智能工具。缺乏注释的数据和最先进的姿势估计方法的复杂性限制了，但是，此类技术在OR内部的部署。在这项工作中，我们建议在教师/学生框架中使用知识蒸馏来利用大规模非注销数据集中存在的知识以及准确但复杂的多阶段教师网络中的知识，以训练一个轻巧的网络，以估算关节的2D/3D姿势估计。教师网络还利用未标记的数据来生成硬标签和软标签，可用于改善学生预测。使用这种有效的自我实施策略训练的易于部署的网络在\ emph {mvor+}上以教师网络为准，这是公共MVOR数据集的扩展，其中所有人员都已完全注释，从而为OR中的实时2D/3D人类姿势估算提供了可行的解决方案。

2D/3D human pose estimation is needed to develop novel intelligent tools for the operating room that can analyze and support the clinical activities. The lack of annotated data and the complexity of state-of-the-art pose estimation approaches limit, however, the deployment of such techniques inside the OR. In this work, we propose to use knowledge distillation in a teacher/student framework to harness the knowledge present in a large-scale non-annotated dataset and in an accurate but complex multi-stage teacher network to train a lightweight network for joint 2D/3D pose estimation. The teacher network also exploits the unlabeled data to generate both hard and soft labels useful in improving the student predictions. The easily deployable network trained using this effective self-supervision strategy performs on par with the teacher network on \emph{MVOR+}, an extension of the public MVOR dataset where all persons have been fully annotated, thus providing a viable solution for real-time 2D/3D human pose estimation in the OR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题