基于视频的人姿势估计的时间特征对齐和共同信息最大化

论文标题

基于视频的人姿势估计的时间特征对齐和共同信息最大化

Temporal Feature Alignment and Mutual Information Maximization for Video-Based Human Pose Estimation

论文作者

Liu, Zhenguang, Feng, Runyang, Chen, Haoming, Wu, Shuang, Gao, Yixing, Gao, Yunjun, Wang, Xiang

论文摘要

多框架人类姿势估计长期以来一直是计算机视觉中的引人注目和基本问题。由于快速运动和视频中经常发生的姿势阻塞，此任务是具有挑战性的。最先进的方法努力合并来自相邻帧（支撑帧）的其他视觉证据，以促进当前帧的姿势估计（关键帧）。到目前为止已经消除的一个方面是，当前方法直接跨帧汇总了不规则的上下文。当前框架和相邻框架的姿势特征之间的空间 - 间距可能会导致结果不令人满意。更重要的是，现有方法基于直接姿势估计损失，不幸的是，该方法无法限制网络以完全利用相邻框架的有用信息。为了解决这些问题，我们提出了一个新颖的分层对齐框架，该框架利用粗到细的变形来逐步更新相邻的框架，以在功能级别与当前框架保持一致。我们进一步建议明确监督从相邻框架中提取的知识提取，以确保提取有用的补充提示。为了实现这一目标，我们理论上分析了框架之间的相互信息，并达到了最大化与任务相关的互信息的损失。这些使我们能够在基准数据集PoSetrack2017上的多帧人姿势估计挑战中排名第一，并在基准标准sub-jhmdb和pose-track2018上获得最先进的性能。我们的代码在https：// github上发布。 com/pose-group/fami pose，希望它对社区有用。

Multi-frame human pose estimation has long been a compelling and fundamental problem in computer vision. This task is challenging due to fast motion and pose occlusion that frequently occur in videos. State-of-the-art methods strive to incorporate additional visual evidences from neighboring frames (supporting frames) to facilitate the pose estimation of the current frame (key frame). One aspect that has been obviated so far, is the fact that current methods directly aggregate unaligned contexts across frames. The spatial-misalignment between pose features of the current frame and neighboring frames might lead to unsatisfactory results. More importantly, existing approaches build upon the straightforward pose estimation loss, which unfortunately cannot constrain the network to fully leverage useful information from neighboring frames. To tackle these problems, we present a novel hierarchical alignment framework, which leverages coarse-to-fine deformations to progressively update a neighboring frame to align with the current frame at the feature level. We further propose to explicitly supervise the knowledge extraction from neighboring frames, guaranteeing that useful complementary cues are extracted. To achieve this goal, we theoretically analyzed the mutual information between the frames and arrived at a loss that maximizes the task-relevant mutual information. These allow us to rank No.1 in the Multi-frame Person Pose Estimation Challenge on benchmark dataset PoseTrack2017, and obtain state-of-the-art performance on benchmarks Sub-JHMDB and Pose-Track2018. Our code is released at https://github. com/Pose-Group/FAMI-Pose, hoping that it will be useful to the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题