除了静态模型和测试集外：基于跨任务和语言的预训练模型的潜力

论文标题

除了静态模型和测试集外：基于跨任务和语言的预训练模型的潜力

Beyond Static Models and Test Sets: Benchmarking the Potential of Pre-trained Models Across Tasks and Languages

论文作者

Ahuja, Kabir, Dandapat, Sandipan, Sitaram, Sunayana, Choudhury, Monojit

论文摘要

尽管最近大规模的多语言模型（MMLM）（例如Mbert和XLMR）支持大约100种语言，但大多数现有的多语言NLP基准都提供了一些语言多样性的这些语言中的评估数据。我们认为，这使得多语言评估的现有实践不可靠，并且没有提供整个语言景观中MMLM的性能的全部情况。我们建议，通过利用与数据和语言类型学有关的功能来估算MMLM在不同语言上的性能，可以在多种语言NLP中进行基准测试，以解决多种语言NLP的基准测试，以解决多种语言NLP的基准测试。我们将绩效预测与翻译测试数据与对四个不同多语言数据集的案例研究进行了比较，并观察到这些方法可以提供可靠的性能估计值，而性能通常与基于翻译的方法相比，而无需任何其他翻译以及评估成本。

Although recent Massively Multilingual Language Models (MMLMs) like mBERT and XLMR support around 100 languages, most existing multilingual NLP benchmarks provide evaluation data in only a handful of these languages with little linguistic diversity. We argue that this makes the existing practices in multilingual evaluation unreliable and does not provide a full picture of the performance of MMLMs across the linguistic landscape. We propose that the recent work done in Performance Prediction for NLP tasks can serve as a potential solution in fixing benchmarking in Multilingual NLP by utilizing features related to data and language typology to estimate the performance of an MMLM on different languages. We compare performance prediction with translating test data with a case study on four different multilingual datasets, and observe that these methods can provide reliable estimates of the performance that are often on-par with the translation based approaches, without the need for any additional translation as well as evaluation costs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题