论文标题
扩散:单眼3D人姿势估计通过deno延扩扩散概率模型
DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model
论文作者
论文摘要
由于开发了2D Kepoint探测器,通过2D到3D提升方法的单眼3D人类姿势估计(HPE)取得了显着的改进。尽管如此,由于固有的深度歧义和遮挡,单眼3D HPE还是一个具有挑战性的问题。为了解决这个问题,许多以前的作品利用时间信息来减轻此类困难。但是,在许多现实世界中,无法访问框架序列。本文着重于从单个2D键盘检测中重建3D姿势。我们没有利用时间信息,而是通过产生多个3D姿势候选者来减轻深度歧义,这些姿势可以映射到相同的2D关键点。我们构建了一个新颖的基于扩散的框架,可有效从现成的2D检测器采样不同的3D姿势。通过通过图形卷积网络替换传统的deno-net来考虑人类关节之间的相关性,我们的方法可以进一步提高性能。我们对广泛采用的人类360万和Humaneva-I数据集进行了评估。进行了全面的实验以证明所提出的方法的功效,并且他们确认我们的模型优于最先进的多障碍3D HPE方法。
Thanks to the development of 2D keypoint detectors, monocular 3D human pose estimation (HPE) via 2D-to-3D uplifting approaches have achieved remarkable improvements. Still, monocular 3D HPE is a challenging problem due to the inherent depth ambiguities and occlusions. To handle this problem, many previous works exploit temporal information to mitigate such difficulties. However, there are many real-world applications where frame sequences are not accessible. This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. Rather than exploiting temporal information, we alleviate the depth ambiguity by generating multiple 3D pose candidates which can be mapped to an identical 2D keypoint. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. By considering the correlation between human joints by replacing the conventional denoising U-Net with graph convolutional network, our approach accomplishes further performance improvements. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets. Comprehensive experiments are conducted to prove the efficacy of the proposed method, and they confirm that our model outperforms state-of-the-art multi-hypothesis 3D HPE methods.