C3：持续预训练，对跨语言临时检索的对比度较弱的监督

论文标题

C3：持续预训练，对跨语言临时检索的对比度较弱的监督

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

论文作者

Yang, Eugene, Nair, Suraj, Chandradevan, Ramraj, Iglesias-Flores, Rebecca, Oard, Douglas W.

论文摘要

预验证的语言模型对包括临时检索在内的众多任务有提高的有效性。最近的工作表明，在对检索任务进行微调之前，继续对具有辅助目标的语言模型预识，可以进一步提高检索效率。与单语的检索不同，设计适当的辅助任务以进行跨语言映射是具有挑战性的。为了应对这一挑战，我们使用不同语言的可比较的Wikipedia文章来进一步预处理多种语言预审计的模型，然后再进行检索任务进行微调。我们表明我们的方法可以提高检索效力。

Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题