ZSTAD：零射击时间活动检测

论文标题

ZSTAD：零射击时间活动检测

ZSTAD: Zero-Shot Temporal Activity Detection

论文作者

Zhang, Lingling, Chang, Xiaojun, Liu, Jun, Luo, Minnan, Wang, Sen, Ge, Zongyuan, Hauptmann, Alexander

论文摘要

视频分析和监视的一个不可或缺的一部分是时间活动检测，这意味着同时识别和本地化活动中的活动。当前，最有效的时间活动检测方法是基于深度学习的，它们通常在大规模注释的视频中表现出色。但是，由于有关某些活动类别和耗时的数据注释的视频，这些方法在实际应用中受到限制。为了解决这个具有挑战性的问题，我们提出了一个名为“零射击时间活动检测”（ZSTAD）的新型任务设置，在训练中从未见过的活动仍然可以检测到。我们将基于R-C3D的端到端深网设计为该解决方案的体系结构。提出的网络通过创新的损失函数进行了优化，该损失函数考虑了活动标签及其超级类的嵌入，同时学习可见和看不见的活动的常见语义。在检测看不见的活动方面，对Thumos14和Charades数据集的实验都显示出令人鼓舞的表现。

An integral part of video analysis and surveillance is temporal activity detection, which means to simultaneously recognize and localize activities in long untrimmed videos. Currently, the most effective methods of temporal activity detection are based on deep learning, and they typically perform very well with large scale annotated videos for training. However, these methods are limited in real applications due to the unavailable videos about certain activity classes and the time-consuming data annotation. To solve this challenging problem, we propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected. We design an end-to-end deep network based on R-C3D as the architecture for this solution. The proposed network is optimized with an innovative loss function that considers the embeddings of activity labels and their super-classes while learning the common semantics of seen and unseen activities. Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题