嵌入元文本信息以改进学习以排名

论文标题

嵌入元文本信息以改进学习以排名

Embedding Meta-Textual Information for Improved Learning to Rank

论文作者

Kuwa, Toshitaka, Schamoni, Shigehiko, Riezler, Stefan

论文摘要

神经学习术语嵌入的方法导致了信息检索的相似性和排名的改善（IR）。到目前为止，神经表示学习尚未扩展到可用于许多IR任务的元文本信息，例如，Wikipedia文章中的先前检索，主题信息或电子商务数据中的产品类别。我们提出了一个框架，该框架可以学习元文本类别的嵌入，并根据文本和元文本信息的组合嵌入来优化改进匹配的成对排名目标。我们在Wikipedia域中的三种语言对以及一对专利领域的跨语性检索进行实验评估中显示了很大的收益。我们的结果强调，组合不同类型信息的方式对于改进模型至关重要。

Neural approaches to learning term embeddings have led to improved computation of similarity and ranking in information retrieval (IR). So far neural representation learning has not been extended to meta-textual information that is readily available for many IR tasks, for example, patent classes in prior-art retrieval, topical information in Wikipedia articles, or product categories in e-commerce data. We present a framework that learns embeddings for meta-textual categories, and optimizes a pairwise ranking objective for improved matching based on combined embeddings of textual and meta-textual information. We show considerable gains in an experimental evaluation on cross-lingual retrieval in the Wikipedia domain for three language pairs, and in the Patent domain for one language pair. Our results emphasize that the mode of combining different types of information is crucial for model improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题