视频场景图生成的元时空时空偏见

论文标题

视频场景图生成的元时空时空偏见

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation

论文作者

Xu, Li, Qu, Haoxuan, Kuen, Jason, Gu, Jiuxiang, Liu, Jun

论文摘要

视频场景图（Vidsgg）旨在将视频内容解析到场景图中，其中涉及建模视频中的时空上下文信息。但是，由于数据集中的长尾训练数据，现有Vidsgg模型的概括性能可能会受到时空条件偏置问题的影响。在这项工作中，从元学习的角度来看，我们提出了一个新颖的元视频场景图（MVSGG）框架来解决这种偏见问题。具体而言，要处理各种类型的时空条件偏见，我们的框架首先构建了一个支持集和一组查询集，其中每个查询集的数据分布与支持集W.R.T.的数据分布不同。一种条件偏见。然后，通过执行一种新颖的元训练和测试过程，以优化模型，以在支撑集训练后在这些查询集上获得良好的测试性能，我们的框架可以有效地指导该模型，以学会对偏见进行良好的概括。广泛的实验证明了我们提出的框架的功效。

Video scene graph generation (VidSGG) aims to parse the video content into scene graphs, which involves modeling the spatio-temporal contextual information in the video. However, due to the long-tailed training data in datasets, the generalization performance of existing VidSGG models can be affected by the spatio-temporal conditional bias problem. In this work, from the perspective of meta-learning, we propose a novel Meta Video Scene Graph Generation (MVSGG) framework to address such a bias problem. Specifically, to handle various types of spatio-temporal conditional biases, our framework first constructs a support set and a group of query sets from the training data, where the data distribution of each query set is different from that of the support set w.r.t. a type of conditional bias. Then, by performing a novel meta training and testing process to optimize the model to obtain good testing performance on these query sets after training on the support set, our framework can effectively guide the model to learn to well generalize against biases. Extensive experiments demonstrate the efficacy of our proposed framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题