解释阿拉伯变压器模型

论文标题

解释阿拉伯变压器模型

Interpreting Arabic Transformer Models

论文作者

Abdelali, Ahmed, Durrani, Nadir, Dalvi, Fahim, Sajjad, Hassan

论文摘要

阿拉伯语是一种闪族语言，与许多方言都广泛使用。鉴于预先训练的语言模型的成功，许多接受阿拉伯语训练的变压器模型及其方言浮出水面。尽管已经将这些模型在下游NLP任务上进行了比较，但尚未进行评估以直接比较内部表示。我们探讨了如何在阿拉伯语预审预周封模型中编码语言信息，并接受了不同品种阿拉伯语的培训。我们使用三个固有任务对模型进行层和神经元分析：基于MSA（现代标准阿拉伯语）和方言POS-TAGGING和方言识别任务的两个形态标记任务。 Our analysis enlightens interesting findings such as: i) word morphology is learned at the lower and middle layers ii) dialectal identification necessitate more knowledge and hence preserved even in the final layers, iii) despite a large overlap in their vocabulary, the MSA-based models fail to capture the nuances of Arabic dialects, iv) we found that neurons in embedding layers are polysemous in nature, while the neurons in middle layers are独有特定属性。

Arabic is a Semitic language which is widely spoken with many dialects. Given the success of pre-trained language models, many transformer models trained on Arabic and its dialects have surfaced. While these models have been compared with respect to downstream NLP tasks, no evaluation has been carried out to directly compare the internal representations. We probe how linguistic information is encoded in Arabic pretrained models, trained on different varieties of Arabic language. We perform a layer and neuron analysis on the models using three intrinsic tasks: two morphological tagging tasks based on MSA (modern standard Arabic) and dialectal POS-tagging and a dialectal identification task. Our analysis enlightens interesting findings such as: i) word morphology is learned at the lower and middle layers ii) dialectal identification necessitate more knowledge and hence preserved even in the final layers, iii) despite a large overlap in their vocabulary, the MSA-based models fail to capture the nuances of Arabic dialects, iv) we found that neurons in embedding layers are polysemous in nature, while the neurons in middle layers are exclusive to specific properties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题