文档网络投影介绍的单词嵌入空间

论文标题

文档网络投影介绍的单词嵌入空间

Document Network Projection in Pretrained Word Embedding Space

论文作者

Gourru, Antoine, Guille, Adrien, Velcin, Julien, Jacques, Julien

论文摘要

我们提出了正规化的线性嵌入（RLE），这是一种新颖的方法，该方法将链接的文档（例如引用网络）集合到一个验证的单词嵌入空间中。除文本内容外，我们还利用了提供互补信息的成对相似性的矩阵（例如，引用图中两个文档的网络接近）。我们首先为每个文档构建一个简单的单词矢量平均值，然后使用相似之处来改变此平均值。文档表示形式可以帮助解决许多信息检索任务，例如建议，分类和聚类。我们证明，我们的方法的表现优于节点分类和链接预测任务的现有文档网络嵌入方法。此外，我们证明它有助于识别相关的关键字来描述文档类。

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of pairwise similarities providing complementary information (e.g., the network proximity of two documents in a citation graph). We first build a simple word vector average for each document, and we use the similarities to alter this average representation. The document representations can help to solve many information retrieval tasks, such as recommendation, classification and clustering. We demonstrate that our approach outperforms or matches existing document network embedding methods on node classification and link prediction tasks. Furthermore, we show that it helps identifying relevant keywords to describe document classes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题