论文标题
无监督的视频分解使用时空迭代推断
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
论文作者
论文摘要
无监督的多对象场景分解是表示学习中的快速出现问题。尽管在静态场景中取得了重大进展,但此类模型仍无法利用视频中存在的重要动态提示。我们提出了一种新颖的时空迭代推理框架,该框架足以共同对复杂的多物体表示形式进行建模,并在跨帧的潜在变量之间进行明确的时间依赖性。这是通过利用2D-LSTM,在迭代性摊销后进行后期的推理和产生的,以实现后验细化。我们的方法提高了分解的整体质量,编码有关对象动力学的信息,并可用于分别预测每个对象的轨迹。此外,我们证明即使没有颜色信息,我们的模型也具有很高的精度。我们演示了我们模型的分解,细分和预测能力,并表明它的表现优于几个基准数据集上的最先进,其中之一是为这项工作策划的,并将公开使用。
Unsupervised multi-object scene decomposition is a fast-emerging problem in representation learning. Despite significant progress in static scenes, such models are unable to leverage important dynamic cues present in video. We propose a novel spatio-temporal iterative inference framework that is powerful enough to jointly model complex multi-object representations and explicit temporal dependencies between latent variables across frames. This is achieved by leveraging 2D-LSTM, temporally conditioned inference and generation within the iterative amortized inference for posterior refinement. Our method improves the overall quality of decompositions, encodes information about the objects' dynamics, and can be used to predict trajectories of each object separately. Additionally, we show that our model has a high accuracy even without color information. We demonstrate the decomposition, segmentation, and prediction capabilities of our model and show that it outperforms the state-of-the-art on several benchmark datasets, one of which was curated for this work and will be made publicly available.