靛蓝：域概括的内在多模式

论文标题

靛蓝：域概括的内在多模式

INDIGO: Intrinsic Multimodality for Domain Generalization

论文作者

Mangla, Puneet, Chandhok, Shivam, Aggarwal, Milan, Balasubramanian, Vineeth N, Krishnamurthy, Balaji

论文摘要

为了使模型在看不见的域（又称域的概括）下进行概括，学习是域 - 不可思议的特征表示，并捕获构成对象类别的基本语义。朝着弱监督的视力语言模型的最新进展，从廉价监督的嘈杂文本注释中学习整体表示，通过捕获在不同域下概括的对象特征，表明了他们对语义理解的能力。但是，当涉及多个源域时，数据集中每个图像的策划文本注释的成本可能会爆炸多次，具体取决于它们的数字。这使得该过程乏味且不可行，从而阻碍了我们直接使用这些监督的视觉语言方法来实现对看不见的领域的最佳概括。从此激励的是，我们研究了如何以“固有”的方式利用现有预训练的多模式网络的多模式信息，以使系统在看不见的域下概括。为此，我们提出了用于域概括（Indigo）的固有多模式，这是一种简单而优雅的方式，是利用这些预训练的多模式网络中存在的固有模态以及视觉方式，以增强在测试时间时不看到域的概括。我们在几个领域的概括设置（封闭状态，OPENDG和有限的来源）上进行了实验，并在看不见的域上显示了最新的概括性能。此外，我们提供了彻底的分析，以发展对靛蓝的整体理解。

For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on semantic understanding by capturing object characteristics that generalize under different domains. However, when multiple source domains are involved, the cost of curating textual annotations for every image in the dataset can blow up several times, depending on their number. This makes the process tedious and infeasible, hindering us from directly using these supervised vision-language approaches to achieve the best generalization on an unseen domain. Motivated from this, we study how multimodal information from existing pre-trained multimodal networks can be leveraged in an "intrinsic" way to make systems generalize under unseen domains. To this end, we propose IntriNsic multimodality for DomaIn GeneralizatiOn (INDIGO), a simple and elegant way of leveraging the intrinsic modality present in these pre-trained multimodal networks along with the visual modality to enhance generalization to unseen domains at test-time. We experiment on several Domain Generalization settings (ClosedDG, OpenDG, and Limited sources) and show state-of-the-art generalization performance on unseen domains. Further, we provide a thorough analysis to develop a holistic understanding of INDIGO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题