论文标题
以自我为中心视频的生成对抗网络,用于将来的手部细分
Generative Adversarial Network for Future Hand Segmentation from Egocentric Video
论文作者
论文摘要
我们介绍了一个新的问题,即从以自我为中心的视频中预期时间序列的未来手罩。一个关键的挑战是对未来头部运动的随机性进行建模,该动作在全球范围内影响了头饰的摄像头视频分析。为此,我们提出了一种新颖的深层生成模型-Egogan,它使用3D完全卷积网络学习以视觉预期为像素的时空视频表示,并使用生成的对抗性网络(GAN)生成未来的头部运动,然后根据视频表示和生成的未来头部运动来预测未来的手罩。我们在Epic-Kitchens和Egtea凝视+数据集上评估了我们的方法。我们进行详细的消融研究,以验证我们方法的设计选择。此外,我们将我们的方法与以前的未来图像分割方法进行了比较,并表明我们的方法可以更准确地预测未来的手掩模。
We introduce the novel problem of anticipating a time series of future hand masks from egocentric video. A key challenge is to model the stochasticity of future head motions, which globally impact the head-worn camera video analysis. To this end, we propose a novel deep generative model -- EgoGAN, which uses a 3D Fully Convolutional Network to learn a spatio-temporal video representation for pixel-wise visual anticipation, generates future head motion using Generative Adversarial Network (GAN), and then predicts the future hand masks based on the video representation and the generated future head motion. We evaluate our method on both the EPIC-Kitchens and the EGTEA Gaze+ datasets. We conduct detailed ablation studies to validate the design choices of our approach. Furthermore, we compare our method with previous state-of-the-art methods on future image segmentation and show that our method can more accurately predict future hand masks.