论文标题
在文章中学习蒸馏:转移预训练的语言模型的几次学习能力
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
论文作者
论文摘要
鉴于在大型预训练语言模型的文化学习过程中取得了成功,我们介绍了内在的学习蒸馏,以将其从大型模型转移到较小的模型中。我们建议将秘密学习目标与语言建模目标相结合,以提炼读取文本示例的能力和任务知识的能力。我们在两个不同的少数学习范式下进行中文学习蒸馏:元文本调整(元元INT)和多任务中的内部上下文调整(多任务)。多任务在多任务中表现更好,但比元INT还需要更多的计算。我们的方法在两个基准的元评分和多任命中都显示出一致的改进:喇嘛和CrossFit。我们广泛的实验和分析表明,在多任务范式下,内在的学习目标和语言建模目标是互补的。与语言建模目标结合使用时,内在学习目标实现了最佳性能。
Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.