论文标题

语义相关性和分类单词嵌入

Semantic Relatedness and Taxonomic Word Embeddings

论文作者

Kacmajor, Magdalena, Kelleher, John D., Klubicka, Filip, Maldonado, Alfredo

论文摘要

本文连接了一系列有关分类单词嵌入的论文。首先要注意,语义相关性存在不同类型,并且不同的词汇表示编码不同形式的相关性。语义相关性中特别重要的区别是主题与分类相关性。接下来,我们提出许多实验,这些实验分析了已通过随机步行在分类法上进行的合成语料库进行培训的分类嵌入。这些实验证明了合成语料库的特性(例如稀有单词的百分比)如何受到知识图的影响。最后,我们探讨了自然和合成语料库的相对大小对嵌入性能的相对大小之间的相互作用,当分类和主题嵌入结合在一起时。

This paper connects a series of papers dealing with taxonomic word embeddings. It begins by noting that there are different types of semantic relatedness and that different lexical representations encode different forms of relatedness. A particularly important distinction within semantic relatedness is that of thematic versus taxonomic relatedness. Next, we present a number of experiments that analyse taxonomic embeddings that have been trained on a synthetic corpus that has been generated via a random walk over a taxonomy. These experiments demonstrate how the properties of the synthetic corpus, such as the percentage of rare words, are affected by the shape of the knowledge graph the corpus is generated from. Finally, we explore the interactions between the relative sizes of natural and synthetic corpora on the performance of embeddings when taxonomic and thematic embeddings are combined.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源