有监督的迭代学习用于互动语言学习

论文标题

有监督的迭代学习用于互动语言学习

Supervised Seeded Iterated Learning for Interactive Language Learning

论文作者

Lu, Yuchen, Singhal, Soumye, Strub, Florian, Pietquin, Olivier, Courville, Aaron

论文摘要

语言漂移一直是通过互动训练语言模型的主要障碍之一。当基于单词的对话代理人接受完成任务的训练时，他们倾向于发明自己的语言而不是利用自然语言。在最近的文献中，两种一般方法部分应对这种现象：监督自我游戏（S2P）并播种了迭代学习（SIL）。尽管S2P共同训练交互式和监督损失以对抗漂移，但SIL改变了训练动力，以防止语言漂移发生。在本文中，我们首先强调了它们各自的弱点，即对人类语料库进行评估时，晚期训练崩溃和更高的负可能性。鉴于这些观察结果，我们引入了受监督的迭代学习，以结合两种方法以最大程度地减少它们各自的弱点。然后，我们在语言拖式翻译游戏中显示\ algo的有效性。

Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P jointly trains interactive and supervised losses to counter the drift, SIL changes the training dynamics to prevent language drift from occurring. In this paper, we first highlight their respective weaknesses, i.e., late-stage training collapses and higher negative likelihood when evaluated on human corpus. Given these observations, we introduce Supervised Seeded Iterated Learning to combine both methods to minimize their respective weaknesses. We then show the effectiveness of \algo in the language-drift translation game.

下载PDF全文

下载文献需遵守相关版权规定

论文标题