论文标题

具有细分级标签的弱监督时间动作定位

Weakly Supervised Temporal Action Localization with Segment-Level Labels

论文作者

Ding, Xinpeng, Wang, Nannan, Gao, Xinbo, Li, Jie, Wang, Xiaoyu, Liu, Tongliang

论文摘要

时间动作本地化将在测试绩效和注释时间成本之间进行权衡。充分监督的方法通过耗时的边界注释实现良好的性能。具有较低视频级别类别标签注释的弱监督方法导致性能较差。在本文中,我们介绍了一个新的细分级监督设置:当注释者观察此处发生的操作时,标记了段。我们将此细分级的监督与培训中的新型本地化模块结合在一起。具体而言,我们设计了一个部分段损失,被认为是从标记段中学习整体操作零件的损失抽样。由于标记的段只是动作的一部分,因此该模型倾向于与培训过程一起过度配合。为了解决这个问题,我们首先获得了以球体损失为指导的判别特征的相似性矩阵。然后,基于矩阵来设计传播损失,以充当正规化项,从而允许在训练期间隐式未标记的段传播。实验验证我们的方法可以在注释时间几乎相同的情况下胜过视频级别的监督方法。

Temporal action localization presents a trade-off between test performance and annotation-time cost. Fully supervised methods achieve good performance with time-consuming boundary annotations. Weakly supervised methods with cheaper video-level category label annotations result in worse performance. In this paper, we introduce a new segment-level supervision setting: segments are labeled when annotators observe actions happening here. We incorporate this segment-level supervision along with a novel localization module in the training. Specifically, we devise a partial segment loss regarded as a loss sampling to learn integral action parts from labeled segments. Since the labeled segments are only parts of actions, the model tends to overfit along with the training process. To tackle this problem, we first obtain a similarity matrix from discriminative features guided by a sphere loss. Then, a propagation loss is devised based on the matrix to act as a regularization term, allowing implicit unlabeled segments propagation during training. Experiments validate that our method can outperform the video-level supervision methods with almost same the annotation time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源