论文标题
视频中的在线多模式的人搜索
Online Multi-modal Person Search in Videos
论文作者
论文摘要
在视频中搜索某些人的任务已在现实世界中的应用程序(例如视频组织和编辑)中的潜力增加。大多数现有的方法都被设计为以离线方式工作,只有在检查整个视频后才可以推断身份。这种工作方式排除了这些方法无法应用于在线服务或需要实时响应的应用程序。在本文中,我们提出了一个在线人员搜索框架,该框架可以在视频中识别人们。该框架将其核心保留为人识别的基础,并通过强化学习获得的策略对其进行动态更新。我们在一个大型电影数据集上的实验表明,所提出的方法是有效的,不仅可以对在线方案进行显着改进,而且还优于离线方法。
The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing. Most existing approaches are devised to work in an offline manner, where identities can only be inferred after an entire video is examined. This working manner precludes such methods from being applied to online services or those applications that require real-time responses. In this paper, we propose an online person search framework, which can recognize people in a video on the fly. This framework maintains a multimodal memory bank at its heart as the basis for person recognition, and updates it dynamically with a policy obtained by reinforcement learning. Our experiments on a large movie dataset show that the proposed method is effective, not only achieving remarkable improvements over online schemes but also outperforming offline methods.