视频重新定位的图形神经网络

论文标题

视频重新定位的图形神经网络

Graph Neural Network for Video Relocalization

论文作者

Zhou, Yuan, Wang, Mingfei, Wang, Ruolin, Huo, Shuwei

论文摘要

在本文中，我们专注于视频重新定位任务，该任务使用查询视频剪辑作为输入，以在另一个未修剪的长视频中检索语义相对视频剪辑。我们发现，在视频重新定位数据集中，存在一种现象，表明通过帧相似性与视频的特征相似性之间不存在一致的关系，这会影响帧之间的特征融合。但是，现有的视频重新定位方法并未完全考虑。考虑到这种现象，在本文中，我们通过将查询视频功能和建议视频功能连接到时间维度中，将视频功能视为图形，其中每个时间段都被视为节点，每个节点的每个节点都被视为每个节点的特征。然后，凭借图形神经网络的功能，我们提出了一个多段特征融合模块，以融合此图的关系功能。在评估了有关ActivityNet V1.2数据集和Thumos14数据集的方法之后，我们发现我们所提出的方法的表现优于最先进的方法。

In this paper, we focus on video relocalization task, which uses a query video clip as input to retrieve a semantic relative video clip in another untrimmed long video. we find that in video relocalization datasets, there exists a phenomenon showing that there does not exist consistent relationship between feature similarity by frame and feature similarity by video, which affects the feature fusion among frames. However, existing video relocalization methods do not fully consider it. Taking this phenomenon into account, in this article, we treat video features as a graph by concatenating the query video feature and proposal video feature along time dimension, where each timestep is treated as a node, each row of the feature matrix is treated as feature of each node. Then, with the power of graph neural networks, we propose a Multi-Graph Feature Fusion Module to fuse the relation feature of this graph. After evaluating our method on ActivityNet v1.2 dataset and Thumos14 dataset, we find that our proposed method outperforms the state of art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题