调查验证的语言模型的可转让性

论文标题

调查验证的语言模型的可转让性

Investigating Transferability in Pretrained Language Models

论文作者

Tamkin, Alex, Singh, Trisha, Giovanardi, Davide, Goodman, Noah

论文摘要

语言模型预处理如何有助于转移学习？我们考虑了一种简单的消融技术，用于确定每个预预层层对转移任务性能的影响。这种方法是部分重新定性，涉及用随机权重替换验证模型的不同层，然后在转移任务上列出整个模型并观察性能的变化。该技术表明，在BERT中，在下游胶水任务上具有高探测性能的层既不需要，也不足以在这些任务上的准确性。此外，使用预审计参数作为层的好处随鉴定数据集大小而巨大不同：当数据富裕时提供巨大性能改进的参数可能在数据砂设置中提供可忽略的好处。这些结果揭示了转移学习过程的复杂性，突出了在冷冻模型或单个数据样本上运行的方法的局限性。

How does language model pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This method, partial reinitialization, involves replacing different layers of a pretrained model with random weights, then finetuning the entire model on the transfer task and observing the change in performance. This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks. Furthermore, the benefit of using pretrained parameters for a layer varies dramatically with finetuning dataset size: parameters that provide tremendous performance improvement when data is plentiful may provide negligible benefits in data-scarce settings. These results reveal the complexity of the transfer learning process, highlighting the limitations of methods that operate on frozen models or single data samples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题