论文标题

使用释义研究上下文嵌入的属性

Using Paraphrases to Study Properties of Contextual Embeddings

论文作者

Burdick, Laura, Kummerfeld, Jonathan K., Mihalcea, Rada

论文摘要

我们使用释义作为独特的数据来源来分析上下文化的嵌入,特别关注BERT。由于措辞自然地编码一致的单词和短语语义,因此它们为研究嵌入的特性提供了独特的镜头。使用释义数据库的对齐方式,我们在释义和短语表示中研究单词。我们发现,上下文嵌入有效地处理多义单词,但在许多情况下给出了同义词,具有令人惊讶的不同表示。我们证实了先前的发现,即Bert对单词顺序很敏感,但是就BERT层的上下文化水平而言,发现与先前工作略有不同的模式。

We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database's alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT's layers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源