图像语义关系产生

论文标题

图像语义关系产生

Image Semantic Relation Generation

论文作者

Du, Mingzhe

论文摘要

场景图提供了超出图像的结构化语义理解。对于下游任务，例如图像检索，视觉问题回答，视觉关系检测甚至自动驾驶汽车技术，场景图不仅可以蒸发复杂的图像信息，而且还可以使用具有广泛应用程序前景的语义级别关系来纠正视觉模型的偏见。但是，构建图表注释的大量劳动成本可能会阻碍PSG在实际情况下的应用。受到人们通常首先识别主题和对象的观察的启发，然后确定它们之间的关系，我们提议将场景图将生成任务分解为两个子任务：1）图像分割任务以拾取合格的对象。 2）受限制的自动回归文本生成任务，以生成给定对象之间的关系。因此，在这项工作中，我们介绍了图像语义关系生成（ISRG），这是一个简单但有效的图像到文本模型，在OpenPSG数据集上达到了31个点，并分别优于强大基准的16分（Resnet-50）和5分（剪辑）。

Scene graphs provide structured semantic understanding beyond images. For downstream tasks, such as image retrieval, visual question answering, visual relationship detection, and even autonomous vehicle technology, scene graphs can not only distil complex image information but also correct the bias of visual models using semantic-level relations, which has broad application prospects. However, the heavy labour cost of constructing graph annotations may hinder the application of PSG in practical scenarios. Inspired by the observation that people usually identify the subject and object first and then determine the relationship between them, we proposed to decouple the scene graphs generation task into two sub-tasks: 1) an image segmentation task to pick up the qualified objects. 2) a restricted auto-regressive text generation task to generate the relation between given objects. Therefore, in this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model, which achieved 31 points on the OpenPSG dataset and outperforms strong baselines respectively by 16 points (ResNet-50) and 5 points (CLIP).

下载PDF全文

下载文献需遵守相关版权规定

论文标题