解构单词嵌入算法

论文标题

解构单词嵌入算法

Deconstructing word embedding algorithms

论文作者

Kenyon-Dean, Kian, Newell, Edward, Cheung, Jackie Chi Kit

论文摘要

单词嵌入是可靠的特征表示，用于为各种NLP应用程序获得高质量结果的单词。当今许多NLP任务中都使用了不受义者的单词嵌入，尤其是在资源有限的设置中，没有可用的高内存容量和GPU。鉴于NLP中单词嵌入的历史成功，我们提出了一些最著名的单词嵌入算法的回顾性。在这项工作中，我们将Word2Vec，Glove等人解构为一种共同的形式，揭示了制作表演单词嵌入所需的一些常见条件。我们认为，本文中的理论发现可以为未来模型的更明智发展提供基础。

Word embeddings are reliable feature representations of words used to obtain high quality results for various NLP applications. Uncontextualized word embeddings are used in many NLP tasks today, especially in resource-limited settings where high memory capacity and GPUs are not available. Given the historical success of word embeddings in NLP, we propose a retrospective on some of the most well-known word embedding algorithms. In this work, we deconstruct Word2vec, GloVe, and others, into a common form, unveiling some of the common conditions that seem to be required for making performant word embeddings. We believe that the theoretical findings in this paper can provide a basis for more informed development of future models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题