在预先训练的变压器中的语言知识和句子级探测之间的相互作用上

论文标题

在预先训练的变压器中的语言知识和句子级探测之间的相互作用上

On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers

论文作者

Mosbach, Marius, Khokhlova, Anna, Hedderich, Michael A., Klakow, Dietrich

论文摘要

微调预训练的上下文化嵌入模型已成为NLP管道的组成部分。同时，探测已成为研究预训练模型所捕获的语言知识的一种方式。然而，很少了解微调如何影响预训练模型的表示，从而对其编码的语言知识。本文有助于缩小这一差距。我们研究了三种不同的预训练模型：Bert，Roberta和Albert，并通过句子级探测微调如何影响其表示形式进行调查。我们发现，对于某些探测任务，微调会导致准确性的重大变化，可能表明微调引入甚至可以从预训练的模型中删除语言知识。但是，这些变化在不同模型，微调和探测任务之间差异很大。我们的分析表明，虽然微调确实改变了预训练模型的表示，并且这些变化通常在较高的层中更大，但仅在少数情况下，微调对探测准确性具有积极的效果，而探测准确性比仅使用强大的合并方法使用预训练的模型更大。根据我们的发现，我们认为微调对探测的正面和负面影响都需要仔细解释。

Fine-tuning pre-trained contextualized embedding models has become an integral part of the NLP pipeline. At the same time, probing has emerged as a way to investigate the linguistic knowledge captured by pre-trained models. Very little is, however, understood about how fine-tuning affects the representations of pre-trained models and thereby the linguistic knowledge they encode. This paper contributes towards closing this gap. We study three different pre-trained models: BERT, RoBERTa, and ALBERT, and investigate through sentence-level probing how fine-tuning affects their representations. We find that for some probing tasks fine-tuning leads to substantial changes in accuracy, possibly suggesting that fine-tuning introduces or even removes linguistic knowledge from a pre-trained model. These changes, however, vary greatly across different models, fine-tuning and probing tasks. Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method. Based on our findings, we argue that both positive and negative effects of fine-tuning on probing require a careful interpretation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题