暹罗对比度嵌入网络用于组成零拍学习

论文标题

暹罗对比度嵌入网络用于组成零拍学习

Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning

论文作者

Li, Xiangyu, Yang, Xu, Wei, Kun, Deng, Cheng, Yang, Muli

论文摘要

组成零射击学习（CZSL）旨在识别训练期间由可见状态和物体形成的看不见的构图。由于与不同对象纠缠的视觉外观中相同的状态可能是不同的，因此CZSL仍然是一项艰巨的任务。某些方法使用两个训练有素的分类器识别状态和对象，而忽略了对象与状态之间的相互作用的影响；其他方法试图学习状态对象组成的联合表示，从而导致可见和看不见的组成集之间的域间隙。在本文中，我们提出了一个新颖的暹罗对比度嵌入网络（场景）（代码：https：//github.com/xduxyli/scen-master），以实现看不见的构图识别。考虑到状态与物体之间的纠缠，我们将视觉特征嵌入了暹罗对比度空间中，以分别捕获其原型，从而减轻了状态与物体之间的相互作用。此外，我们设计了一个状态过渡模块（STM）来增加训练组成的多样性，从而提高了识别模型的鲁棒性。广泛的实验表明，我们的方法在三个具有挑战性的基准数据集（包括最近提出的C-QGA数据集）上的最先进方法大大优于最先进的方法。

Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositions formed from seen state and object during training. Since the same state may be various in the visual appearance while entangled with different objects, CZSL is still a challenging task. Some methods recognize state and object with two trained classifiers, ignoring the impact of the interaction between object and state; the other methods try to learn the joint representation of the state-object compositions, leading to the domain gap between seen and unseen composition sets. In this paper, we propose a novel Siamese Contrastive Embedding Network (SCEN) (Code: https://github.com/XDUxyLi/SCEN-master) for unseen composition recognition. Considering the entanglement between state and object, we embed the visual feature into a Siamese Contrastive Space to capture prototypes of them separately, alleviating the interaction between state and object. In addition, we design a State Transition Module (STM) to increase the diversity of training compositions, improving the robustness of the recognition model. Extensive experiments indicate that our method significantly outperforms the state-of-the-art approaches on three challenging benchmark datasets, including the recent proposed C-QGA dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题