论文标题
重新访问预训练的语言模型及其对阿拉伯自然语言理解的评估
Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
论文作者
论文摘要
近年来,为阿拉伯语开发预先训练的语言模型(PLM)的工作越来越多。这项工作涉及解决现有阿拉伯语PLM中的两个主要问题,这些问题限制了阿拉伯语NLU和NLG领域的进展。首先,现有的阿拉伯语PLM尚未充分探索,并且可以使用更有条理的方法来显着改善其前Trainig。其次,在文献中缺乏对这些模型的系统评估。在这项工作中,我们重新审视了阿拉伯语PLM的预训练和评估。在预训练方面,我们从三个角度探索了改进的阿拉伯语LMS:预训练数据的质量,模型的大小以及合并角色级别的信息。结果,我们发布了三种新的阿拉伯语Bert风格模型(Jaber,Char-Jaber和Saber),以及两个T5风格的模型(AT5S和AT5B)。在评估方面,我们进行了一项全面的实证研究,以系统地评估ALUE上现有最新模型的性能,该模型是阿拉伯语NLU任务的排行榜供电基准,以及阿拉伯NLG任务的Argen基准的子集。我们表明,我们的模型大大优于现有的阿拉伯语PLM,并在歧视性和生成性的阿拉伯语NLU和NLG任务上实现了新的最先进的性能。我们的模型和源代码将很快提供。
There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language. This work concerns addressing two major problems in existing Arabic PLMs which constraint progress of the Arabic NLU and NLG fields.First, existing Arabic PLMs are not well-explored and their pre-trainig can be improved significantly using a more methodical approach. Second, there is a lack of systematic and reproducible evaluation of these models in the literature. In this work, we revisit both the pre-training and evaluation of Arabic PLMs. In terms of pre-training, we explore improving Arabic LMs from three perspectives: quality of the pre-training data, size of the model, and incorporating character-level information. As a result, we release three new Arabic BERT-style models ( JABER, Char-JABER, and SABER), and two T5-style models (AT5S and AT5B). In terms of evaluation, we conduct a comprehensive empirical study to systematically evaluate the performance of existing state-of-the-art models on ALUE that is a leaderboard-powered benchmark for Arabic NLU tasks, and on a subset of the ARGEN benchmark for Arabic NLG tasks. We show that our models significantly outperform existing Arabic PLMs and achieve a new state-of-the-art performance on discriminative and generative Arabic NLU and NLG tasks. Our models and source code to reproduce of results will be made available shortly.