Dalle-2正在看到双重：文字概念映射中的缺陷2图像模型

论文标题

Dalle-2正在看到双重：文字概念映射中的缺陷2图像模型

DALLE-2 is Seeing Double: Flaws in Word-to-Concept Mapping in Text2Image Models

论文作者

Rassin, Royi, Ravfogel, Shauli, Goldberg, Yoav

论文摘要

我们研究提示中的dalle-2映射符号（单词）的参考文献（生成图像中实体的实体或属性）的方式。我们表明，与人类过程语言的方式形成鲜明对比的是，Dalle-2并不遵循这样的约束，即每个单词在解释中都有一个角色，有时还会出于不同目的重复使用相同的符号。我们收集了一组反映这种现象的刺激：我们表明Dalle-2同时描绘了具有多种感觉的名词的两个感觉。并且一个给定的单词可以修改图像中两个不同实体的属性，也可以将其描述为一个对象，并修改另一个对象的属性，从而创建实体之间属性的语义泄漏。综上所述，我们的研究强调了Dalle-2和人类语言处理之间的差异，并为未来研究文本模型的电感偏见开辟了途径。

We study the way DALLE-2 maps symbols (words) in the prompt to their references (entities or properties of entities in the generated image). We show that in stark contrast to the way human process language, DALLE-2 does not follow the constraint that each word has a single role in the interpretation, and sometimes re-use the same symbol for different purposes. We collect a set of stimuli that reflect the phenomenon: we show that DALLE-2 depicts both senses of nouns with multiple senses at once; and that a given word can modify the properties of two distinct entities in the image, or can be depicted as one object and also modify the properties of another object, creating a semantic leakage of properties between entities. Taken together, our study highlights the differences between DALLE-2 and human language processing and opens an avenue for future study on the inductive biases of text-to-image models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题