多视图教学视频中的弱监督在线操作细分

论文标题

多视图教学视频中的弱监督在线操作细分

Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos

论文作者

Ghoddoosian, Reza, Dwivedi, Isht, Agarwal, Nakul, Choi, Chiho, Dariush, Behzad

论文摘要

本文解决了教学视频中弱监督在线行动细分的新问题。我们提出了一个框架，可以使用动态编程在测试时间在线进行流式传输视频，并显示其优于贪婪的滑动窗口方法。我们通过引入在线途径差异损失（OODL）来鼓励分割结果具有更高的时间一致性来改善我们的框架。此外，只有在培训期间，我们在多个视图之间利用框架的对应关系，作为培训弱标记的教学视频的监督。特别是，我们研究了三种不同的多视图推理技术，以生成更准确的框架伪基真实性，而没有额外的注释成本。我们介绍了两个基准多视图数据集，早餐和宜家ASM的结果和消融研究。实验结果表明，在两个烹饪和组装的两个领域，所提出的方法在定性和定量上的功效。

This paper addresses a new problem of weakly-supervised online action segmentation in instructional videos. We present a framework to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach. We improve our framework by introducing the Online-Offline Discrepancy Loss (OODL) to encourage the segmentation results to have a higher temporal consistency. Furthermore, only during training, we exploit frame-wise correspondence between multiple views as supervision for training weakly-labeled instructional videos. In particular, we investigate three different multi-view inference techniques to generate more accurate frame-wise pseudo ground-truth with no additional annotation cost. We present results and ablation studies on two benchmark multi-view datasets, Breakfast and IKEA ASM. Experimental results show efficacy of the proposed methods both qualitatively and quantitatively in two domains of cooking and assembly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题