语言的作用不仅仅是描述：关于文本对图像模型中缺乏比喻的演讲

论文标题

语言的作用不仅仅是描述：关于文本对图像模型中缺乏比喻的演讲

Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models

论文作者

Kleinlein, Ricardo, Luna-Jiménez, Cristina, Fernández-Martínez, Fernando

论文摘要

最近的文本到图像扩散模型所显示的令人印象深刻的能力从文本输入提示中生成高质量的图片，这使人们利用了有关艺术的定义的辩论。尽管如此，这些模型已通过基于内容的标签协议收集的文本数据进行了培训，该协议的重点是描述图像中的项目和动作，但忽略了任何主观评估。因此，这些自动系统需要对要生成的元素和图像的图形样式进行严格描述，否则未能交付。作为当前生成模型的实际艺术能力的潜在指标，我们表征了用于训练当前文本到图像扩散模型的公开文本数据的感性，客观性和抽象程度。考虑到他们的语言风格和通常在艺术背景下使用的急剧差异，我们建议生成模型应在培训中纳入其他主观信息来源，以克服（或至少减轻）当前的某些局限性，从而有效释放真正的艺术和创造性和创造性。

The impressive capacity shown by recent text-to-image diffusion models to generate high-quality pictures from textual input prompts has leveraged the debate about the very definition of art. Nonetheless, these models have been trained using text data collected from content-based labelling protocols that focus on describing the items and actions in an image but neglect any subjective appraisal. Consequently, these automatic systems need rigorous descriptions of the elements and the pictorial style of the image to be generated, otherwise failing to deliver. As potential indicators of the actual artistic capabilities of current generative models, we characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models. Considering the sharp difference observed between their language style and that typically employed in artistic contexts, we suggest generative models should incorporate additional sources of subjective information in their training in order to overcome (or at least to alleviate) some of their current limitations, thus effectively unleashing a truly artistic and creative generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题