论文标题

CLOP:视频和语言预培训和知识正规化

CLOP: Video-and-Language Pre-Training with Knowledge Regularizations

论文作者

Li, Guohao, Yang, Hu, He, Feng, Feng, Zhifan, Lyu, Yajuan, Wu, Hua, Wang, Haifeng

论文摘要

视频和语言预培训显示出可概括的表示的有希望的结果。大多数现有方法通常以隐式方式对视频和文本进行建模,而无需考虑多模式内容的明确结构表示。我们将这种形式表示为结构知识,表达了多种粒度的语义。有相关的工作提出了对象感知方法,以注入与输入相似的知识。但是,现有方法通常无法有效利用诸如正规化之类的知识来塑造出色的跨模式表示空间。为此,我们提出了一种具有知识正规化的跨模式知识增强的预训练(CLOP)方法。我们有两个关键的设计:1)简单而有效的结构知识预测(SKP)任务,将类似视频的潜在表示汇总在一起; 2)一种新颖的知识引导抽样方法(KCL),以推开跨模式硬性阴性样本。我们在四个文本视频检索任务和一项多项选择质量检查任务上评估了我们的方法。实验显示出明显的改进,表现优于先前的作品。此外,我们提供了消融和见解,说明我们的方法如何影响潜在的表示空间,证明了将知识正规化纳入视频和语言预训练的价值。

Video-and-language pre-training has shown promising results for learning generalizable representations. Most existing approaches usually model video and text in an implicit manner, without considering explicit structural representations of the multi-modal content. We denote such form of representations as structural knowledge, which express rich semantics of multiple granularities. There are related works that propose object-aware approaches to inject similar knowledge as inputs. However, the existing methods usually fail to effectively utilize such knowledge as regularizations to shape a superior cross-modal representation space. To this end, we propose a Cross-modaL knOwledge-enhanced Pre-training (CLOP) method with Knowledge Regularizations. There are two key designs of ours: 1) a simple yet effective Structural Knowledge Prediction (SKP) task to pull together the latent representations of similar videos; and 2) a novel Knowledge-guided sampling approach for Contrastive Learning (KCL) to push apart cross-modal hard negative samples. We evaluate our method on four text-video retrieval tasks and one multi-choice QA task. The experiments show clear improvements, outperforming prior works by a substantial margin. Besides, we provide ablations and insights of how our methods affect the latent representation space, demonstrating the value of incorporating knowledge regularizations into video-and-language pre-training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源