论文标题
RUC_AIM团队3 ActivityNet 2020任务2:探索密集视频字幕的顺序事件检测
Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning
论文作者
论文摘要
在未修剪的视频中检测有意义的事件对于密集的视频字幕至关重要。在这项工作中,我们为事件序列生成提出了一个新颖而简单的模型,并探讨了视频中事件序列的时间关系。所提出的模型省略了效率低下的两阶段提案生成,并直接生成以双向时间依赖性为条件的事件边界。实验结果表明,提出的事件序列产生模型可以在少数建议中产生更准确和多样化的事件。对于活动字幕,我们遵循以前的工作,将事件内字幕模型采用到我们的管道系统中。整个系统在视频任务中的密集启动事件中实现了最先进的性能,在挑战测试集中,流星得分为9.894。
Detecting meaningful events in an untrimmed video is essential for dense video captioning. In this work, we propose a novel and simple model for event sequence generation and explore temporal relationships of the event sequence in the video. The proposed model omits inefficient two-stage proposal generation and directly generates event boundaries conditioned on bi-directional temporal dependency in one pass. Experimental results show that the proposed event sequence generation model can generate more accurate and diverse events within a small number of proposals. For the event captioning, we follow our previous work to employ the intra-event captioning models into our pipeline system. The overall system achieves state-of-the-art performance on the dense-captioning events in video task with 9.894 METEOR score on the challenge testing set.