论文标题
野外全身人类姿势估计
Whole-Body Human Pose Estimation in the Wild
论文作者
论文摘要
本文调查了2D人类全身姿势估计的任务,该任务旨在将整个人体(包括面部,手,身体和脚的人体)定位在整个人体上。由于现有数据集没有全身注释,因此以前的方法必须在人脸,手和身体的不同数据集上组装不同的深层模型,并与数据集偏见和较大的模型复杂性斗争。为了填写此空白,我们介绍了可可叶全体,该机构扩展了带有全身注释的可可数据集。据我们所知,这是第一个在整个人体上具有手动注释的基准,其中包括133个密集的地标,脸上有68个,手和脚上有23个。设计了一个名为Zoomnet的单网络模型,以考虑到完整人体的层次结构,以解决同一个人不同身体部位的尺度变化。 Zoomnet能够在提出的可可全体数据集上显着胜过现有的方法。广泛的实验表明,可可全身不仅可以用于从头开始训练深层模型以进行全身姿势估计,而且还可以作为许多不同任务(例如面部地标检测和手动键盘估算)的强大预训练数据集。该数据集可在https://github.com/jin-s13/coco-wholebody上公开获取。
This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet. As existing datasets do not have whole-body annotations, previous methods have to assemble different deep models trained independently on different datasets of the human face, hand, and body, struggling with dataset biases and large model complexity. To fill in this blank, we introduce COCO-WholeBody which extends COCO dataset with whole-body annotations. To our best knowledge, it is the first benchmark that has manual annotations on the entire human body, including 133 dense landmarks with 68 on the face, 42 on hands and 23 on the body and feet. A single-network model, named ZoomNet, is devised to take into account the hierarchical structure of the full human body to solve the scale variation of different body parts of the same person. ZoomNet is able to significantly outperform existing methods on the proposed COCO-WholeBody dataset. Extensive experiments show that COCO-WholeBody not only can be used to train deep models from scratch for whole-body pose estimation but also can serve as a powerful pre-training dataset for many different tasks such as facial landmark detection and hand keypoint estimation. The dataset is publicly available at https://github.com/jin-s13/COCO-WholeBody.