论文标题

实体,日期和语言:带有T0的历史文本对零拍摄

Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

论文作者

De Toni, Francesco, Akiki, Christopher, de la Rosa, Javier, Fourrier, Clémentine, Manjavacas, Enrique, Schweter, Stefan, van Strien, Daniel

论文摘要

在这项工作中,我们探讨了最近展示的T0模型的零射击能力是否扩展到分布式语言和时间段的指定实体识别。我们使用3种语言的历史报纸语料库作为测试床,我们使用提示提取可能的命名实体。我们的结果表明,基于迅速的零击多语言实体识别的幼稚方法是容易出错的,但突出了这种方法的潜力,用于缺乏标记数据集的历史语言。此外,我们还发现可以探究类似T0的模型来预测文档的出版日期和语言,这与历史文本的研究可能非常相关。

In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源