递归解码：一种以基础语言理解形成产生的认知方法

论文标题

递归解码：一种以基础语言理解形成产生的认知方法

Recursive Decoding: A Situated Cognition Approach to Compositional Generation in Grounded Language Understanding

论文作者

Setzler, Matthew, Howland, Scott, Phillips, Lauren

论文摘要

组成概括是神经语言模型的令人不安的盲点。最近的努力提出了提高模型编码已知输入的新型组合能力的技术，但较少的工作重点是生成已知输出的新型组合。在这里，我们将重点放在GSCAN的背景下的后一种“解码端”形式上，这是基于基础语言理解中组成概括的合成基准。我们提出了递归解码（RD），这是一种用于训练和使用SEQ2SEQ模型的新过程，针对解码侧概括。与其在一个通过中生成整个输出序列，不如训练模型一次预测一个令牌。然后根据预测令牌对输入（即外部GSCAN环境）进行逐步更新，并重新编码下一个解码器时间步骤。因此，RD将复杂的，分布序列的序列生成任务分解为一系列的增量预测，每个预测都类似于模型在训练过程中已经看到的内容。 RD对GSCAN的两个先前被忽视的概括任务产生了巨大的改进。我们提供了分析以阐明基线失败的这些收益，然后讨论自然主义基础语言理解中对概括的影响，而SEQ2SEQ则更加一般。

Compositional generalization is a troubling blind spot for neural language models. Recent efforts have presented techniques for improving a model's ability to encode novel combinations of known inputs, but less work has focused on generating novel combinations of known outputs. Here we focus on this latter "decode-side" form of generalization in the context of gSCAN, a synthetic benchmark for compositional generalization in grounded language understanding. We present Recursive Decoding (RD), a novel procedure for training and using seq2seq models, targeted towards decode-side generalization. Rather than generating an entire output sequence in one pass, models are trained to predict one token at a time. Inputs (i.e., the external gSCAN environment) are then incrementally updated based on predicted tokens, and re-encoded for the next decoder time step. RD thus decomposes a complex, out-of-distribution sequence generation task into a series of incremental predictions that each resemble what the model has already seen during training. RD yields dramatic improvement on two previously neglected generalization tasks in gSCAN. We provide analyses to elucidate these gains over failure of a baseline, and then discuss implications for generalization in naturalistic grounded language understanding, and seq2seq more generally.

下载PDF全文

下载文献需遵守相关版权规定

论文标题