P-STMO：3D人姿势估计的预先训练的空间时间多一对一模型

论文标题

P-STMO：3D人姿势估计的预先训练的空间时间多一对一模型

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

论文作者

Shan, Wenkang, Liu, Zhenhua, Zhang, Xinfeng, Wang, Shanshe, Ma, Siwei, Gao, Wen

论文摘要

本文介绍了一种新型的预训练的空间时间次次（P-STMO）模型，用于2 to-3d人类姿势估计任务。为了减少捕获空间和时间信息的困难，我们将此任务分为两个阶段：预训练（I期）和微调（II阶段）。在第一阶段，提出了一个自我监督的预训练子任务，称为蒙面姿势建模。输入序列中的人关节在空间和时间域中被随机掩盖。利用了降级自动编码器的一般形式以恢复原始的2D姿势，并且编码器能够以这种方式捕获空间和时间依赖性。在第二阶段，预训练的编码器被加载到STMO模型并进行微调。编码器之后是一个多对一的框架聚合器，以预测当前帧中的3D姿势。特别是，MLP块被用作STMO中的空间特征提取器，其性能比其他方法更好。此外，提出了一种时间下采样策略，以减少数据冗余。在两个基准上进行的广泛实验表明，我们的方法优于较少参数和较少计算开销的最先进方法。例如，我们的P-STMO模型在使用CPN作为输入的2D姿势时，在Human3.6M数据集上实现42.1mm MPJPE。同时，它为最新方法带来了1.5-7.1倍的速度。代码可在https://github.com/patrick-swk/p-stmo上找到。

This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (P-STMO) model for 2D-to-3D human pose estimation task. To reduce the difficulty of capturing spatial and temporal information, we divide this task into two stages: pre-training (Stage I) and fine-tuning (Stage II). In Stage I, a self-supervised pre-training sub-task, termed masked pose modeling, is proposed. The human joints in the input sequence are randomly masked in both spatial and temporal domains. A general form of denoising auto-encoder is exploited to recover the original 2D poses and the encoder is capable of capturing spatial and temporal dependencies in this way. In Stage II, the pre-trained encoder is loaded to STMO model and fine-tuned. The encoder is followed by a many-to-one frame aggregator to predict the 3D pose in the current frame. Especially, an MLP block is utilized as the spatial feature extractor in STMO, which yields better performance than other methods. In addition, a temporal downsampling strategy is proposed to diminish data redundancy. Extensive experiments on two benchmarks show that our method outperforms state-of-the-art methods with fewer parameters and less computational overhead. For example, our P-STMO model achieves 42.1mm MPJPE on Human3.6M dataset when using 2D poses from CPN as inputs. Meanwhile, it brings a 1.5-7.1 times speedup to state-of-the-art methods. Code is available at https://github.com/paTRICK-swk/P-STMO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题