具有视觉的语言：关于接地单词和句子嵌入的研究

论文标题

具有视觉的语言：关于接地单词和句子嵌入的研究

Language with Vision: a Study on Grounded Word and Sentence Embeddings

论文作者

Shahmohammadi, Hassan, Heitmeier, Maria, Shafaei-Bajestan, Elnaz, Lensch, Hendrik P. A., Baayen, Harald

论文摘要

视觉中的基础语言是一个积极的研究领域，试图通过将视觉的感知知识纳入基于文本的表示形式，以构建认知上合理的单词和句子表示。尽管进行了许多语言基础的尝试，但在语言的文本表示和我们体现的体验之间达到了最佳的平衡仍然是一个开放的领域。一些普遍的问题是以下内容。视觉接地是否对抽象单词有利，还是其有效性仅限于具体单词？弥合文本和视觉之间差距的最佳方法是什么？来自图像的感知知识在多大程度上有利地获取高质量的嵌入？利用当前的机器学习和自然语言处理的进步，本研究通过为预训练的单词嵌入式提出一个简单但非常有效的计算基础模型来解决这些问题。我们的模型通过将文本嵌入与视觉信息保持一致，同时保留在文本语料库中表征单词用法的分布统计数据，从而有效地平衡语言和视觉之间的相互作用。通过应用学习的对齐方式，我们可以间接地将看不见的单词（包括抽象单词）结合起来。对一系列行为数据集的一系列评估表明，视觉接地不仅对具体单词有益，而且对抽象单词也有益，也对抽象概念的间接理论提供了支持。此外，我们的方法为上下文化的嵌入提供了优势，例如伯特生成的嵌入，但只有在接受谦虚，认知上合理的大小的培训时，才有优势。可以在https://github.com/hazel1994/visaly_grounded_word_word_embeddings_2上找到英语的代码和接地嵌入。

Grounding language in vision is an active field of research seeking to construct cognitively plausible word and sentence representations by incorporating perceptual knowledge from vision into text-based representations. Despite many attempts at language grounding, achieving an optimal equilibrium between textual representations of the language and our embodied experiences remains an open field. Some common concerns are the following. Is visual grounding advantageous for abstract words, or is its effectiveness restricted to concrete words? What is the optimal way of bridging the gap between text and vision? To what extent is perceptual knowledge from images advantageous for acquiring high-quality embeddings? Leveraging the current advances in machine learning and natural language processing, the present study addresses these questions by proposing a simple yet very effective computational grounding model for pre-trained word embeddings. Our model effectively balances the interplay between language and vision by aligning textual embeddings with visual information while simultaneously preserving the distributional statistics that characterize word usage in text corpora. By applying a learned alignment, we are able to indirectly ground unseen words including abstract words. A series of evaluations on a range of behavioural datasets shows that visual grounding is beneficial not only for concrete words but also for abstract words, lending support to the indirect theory of abstract concepts. Moreover, our approach offers advantages for contextualized embeddings, such as those generated by BERT, but only when trained on corpora of modest, cognitively plausible sizes. Code and grounded embeddings for English are available at https://github.com/Hazel1994/Visually_Grounded_Word_Embeddings_2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题