在文章中学习蒸馏：转移预训练的语言模型的几次学习能力

论文标题

在文章中学习蒸馏：转移预训练的语言模型的几次学习能力

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

论文作者

Huang, Yukun, Chen, Yanda, Yu, Zhou, McKeown, Kathleen

论文摘要

鉴于在大型预训练语言模型的文化学习过程中取得了成功，我们介绍了内在的学习蒸馏，以将其从大型模型转移到较小的模型中。我们建议将秘密学习目标与语言建模目标相结合，以提炼读取文本示例的能力和任务知识的能力。我们在两个不同的少数学习范式下进行中文学习蒸馏：元文本调整（元元INT）和多任务中的内部上下文调整（多任务）。多任务在多任务中表现更好，但比元INT还需要更多的计算。我们的方法在两个基准的元评分和多任命中都显示出一致的改进：喇嘛和CrossFit。我们广泛的实验和分析表明，在多任务范式下，内在的学习目标和语言建模目标是互补的。与语言建模目标结合使用时，内在学习目标实现了最佳性能。

Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题