论文标题

C3:持续预训练,对跨语言临时检索的对比度较弱的监督

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval

论文作者

Yang, Eugene, Nair, Suraj, Chandradevan, Ramraj, Iglesias-Flores, Rebecca, Oard, Douglas W.

论文摘要

预验证的语言模型对包括临时检索在内的众多任务有提高的有效性。最近的工作表明,在对检索任务进行微调之前,继续对具有辅助目标的语言模型预识,可以进一步提高检索效率。与单语的检索不同,设计适当的辅助任务以进行跨语言映射是具有挑战性的。为了应对这一挑战,我们使用不同语言的可比较的Wikipedia文章来进一步预处理多种语言预审计的模型,然后再进行检索任务进行微调。我们表明我们的方法可以提高检索效力。

Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源