Motion2VEC：半监督的表示从外科视频中学习

论文标题

Motion2VEC：半监督的表示从外科视频中学习

Motion2Vec: Semi-Supervised Representation Learning from Surgical Videos

论文作者

Tanwani, Ajay Kumar, Sermanet, Pierre, Yan, Andy, Anand, Raghav, Phielipp, Mariano, Goldberg, Ken

论文摘要

在嵌入空间中学习有意义的视觉表示可以促进下游任务（例如动作分割和模仿）中的概括。在本文中，我们通过将它们分为半监督的方式将其分组为动作段/子目标/选项，以学习为中心的手术视频演示。我们提出了Motion2Vec，这是一种算法，该算法从视频观察中学习了一个深层的特征空间，通过最大程度地减少暹罗网络中的度量学习损失：来自相同动作段的图像被拉在一起，同时将其从其他片段的随机抽样图像推开，同时尊重图像的时间顺序。在预训练暹罗网络后，用复发的神经网络对嵌入式神经网络进行迭代分段。我们仅使用一小部分标记的视频片段来将嵌入式空间对齐，并通过推断学到的模型参数来将伪标记分配给其余未标记的数据。我们证明了这种表示形式的使用来模仿拼图数据集的公开视频中的手术缝合动作。结果平均得出了85.5％的分割精度，表明在几个最新的基线上提高了性能的提高，而运动姿势模仿的每观察值的位置误差为0.94厘米。视频，代码和数据可从https://sites.google.com/view/motion2vec获得

Learning meaningful visual representations in an embedding space can facilitate generalization in downstream tasks such as action segmentation and imitation. In this paper, we learn a motion-centric representation of surgical video demonstrations by grouping them into action segments/sub-goals/options in a semi-supervised manner. We present Motion2Vec, an algorithm that learns a deep embedding feature space from video observations by minimizing a metric learning loss in a Siamese network: images from the same action segment are pulled together while pushed away from randomly sampled images of other segments, while respecting the temporal ordering of the images. The embeddings are iteratively segmented with a recurrent neural network for a given parametrization of the embedding space after pre-training the Siamese network. We only use a small set of labeled video segments to semantically align the embedding space and assign pseudo-labels to the remaining unlabeled data by inference on the learned model parameters. We demonstrate the use of this representation to imitate surgical suturing motions from publicly available videos of the JIGSAWS dataset. Results give 85.5 % segmentation accuracy on average suggesting performance improvement over several state-of-the-art baselines, while kinematic pose imitation gives 0.94 centimeter error in position per observation on the test set. Videos, code and data are available at https://sites.google.com/view/motion2vec

下载PDF全文

下载文献需遵守相关版权规定

论文标题