论文标题
基础语言快速学习缓慢
Grounded Language Learning Fast and Slow
论文作者
论文摘要
最近的工作表明,通过传统监督学习目标培训的大型基于文本的神经语言模型,为少数和一次性学习带来了令人惊讶的倾向。在这里,我们表明,位于模拟的3D世界中的一个具体代理,并具有新颖的双重编码外部记忆,当接受传统的强化学习算法训练时,可以表现出相似的单一单词学习。在通过连续的视觉感知和语言提示(“这是dax”)对新对象进行了单一介绍后,代理可以重新识别对象并按照指示操纵它(“将dax放在床上”)。这样一来,它无缝地集成了短期内部的记录知识,“ dax”一词的适当指南与跨情节中获得的长期词汇和运动知识(即“床”和“ putting”)相结合。我们发现,在某些训练条件和特定的记忆写作机制下,代理的单次单词对象结合概括了同一塑形类别中的新型示例,并且在具有不熟悉对象数量的不熟悉的设置中有效。我们进一步展示了如何利用双重编码内存作为内在动机的信号,刺激代理寻求对象的名称,这些对象可能对以后执行指令有用。总之,结果表明,深层神经网络可以利用元学习,情节记忆和明确的多模式环境来解释“快速映射”,这是人类认知发展的基本支柱,以及与人类相互作用的媒体的潜在变革能力。
Recent work has shown that large text-based neural language models, trained with conventional supervised learning objectives, acquire a surprising propensity for few- and one-shot learning. Here, we show that an embodied agent situated in a simulated 3D world, and endowed with a novel dual-coding external memory, can exhibit similar one-shot word learning when trained with conventional reinforcement learning algorithms. After a single introduction to a novel object via continuous visual perception and a language prompt ("This is a dax"), the agent can re-identify the object and manipulate it as instructed ("Put the dax on the bed"). In doing so, it seamlessly integrates short-term, within-episode knowledge of the appropriate referent for the word "dax" with long-term lexical and motor knowledge acquired across episodes (i.e. "bed" and "putting"). We find that, under certain training conditions and with a particular memory writing mechanism, the agent's one-shot word-object binding generalizes to novel exemplars within the same ShapeNet category, and is effective in settings with unfamiliar numbers of objects. We further show how dual-coding memory can be exploited as a signal for intrinsic motivation, stimulating the agent to seek names for objects that may be useful for later executing instructions. Together, the results demonstrate that deep neural networks can exploit meta-learning, episodic memory and an explicitly multi-modal environment to account for 'fast-mapping', a fundamental pillar of human cognitive development and a potentially transformative capacity for agents that interact with human users.