论文标题
针对弱监督的时间动作定位的两流共有网络
Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization
论文作者
论文摘要
弱监督的时间动作本地化(W-TAL)旨在仅在视频级别的监督下对未修剪视频进行分类和本地位置。但是,如果没有框架级注释,W-TAL方法识别误报行动建议并生成具有精确时间边界的行动建议是一项挑战。在本文中,我们提出了一个两流共识网络(TSCN),以同时解决这些挑战。拟议的TSCN采用了一种迭代的改进训练方法,其中框架级伪基的真相是迭代更新的,并用于提供框架级别的监督,以改善模型培训和消除误报行动建议。此外,我们提出了一种新的注意归一化损失,以鼓励预测的注意力像二进制选择,并促进作用实例边界的精确定位。在Thumos14和ActivityNet数据集上进行的实验表明,所提出的TSCN优于当前的最新方法,甚至可以通过一些最新的全面监督方法获得可比的结果。
Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and localize all action instances in an untrimmed video under only video-level supervision. However, without frame-level annotations, it is challenging for W-TAL methods to identify false positive action proposals and generate action proposals with precise temporal boundaries. In this paper, we present a Two-Stream Consensus Network (TSCN) to simultaneously address these challenges. The proposed TSCN features an iterative refinement training method, where a frame-level pseudo ground truth is iteratively updated, and used to provide frame-level supervision for improved model training and false positive action proposal elimination. Furthermore, we propose a new attention normalization loss to encourage the predicted attention to act like a binary selection, and promote the precise localization of action instance boundaries. Experiments conducted on the THUMOS14 and ActivityNet datasets show that the proposed TSCN outperforms current state-of-the-art methods, and even achieves comparable results with some recent fully-supervised methods.