何时可以将变压器扎根和构成：来自组成概括基准的见解

论文标题

何时可以将变压器扎根和构成：来自组成概括基准的见解

When Can Transformers Ground and Compose: Insights from Compositional Generalization Benchmarks

论文作者

Sikarwar, Ankur, Patel, Arkil, Goyal, Navin

论文摘要

人类可以在构图上推论，同时将语言的话语扎根于现实世界。诸如REASCAN之类的最新基准测试使用基于网格世界的导航任务来评估神经模型是否具有类似的功能。在这项工作中，我们提出了一个简单的基于变压器的模型，该模型胜过REASCAN上的专业体系结构和修改后的GSCAN。在分析任务时，我们发现确定网格世界中的目标位置是模型的主要挑战。此外，我们表明，REASCAN中的特定分裂测试深度概括是不公平的。在此拆分的修订版本中，我们表明变形金刚可以将其推广到更深的输入结构。最后，我们设计了更简单的构成概括任务，REDEX，以研究变形金刚在组成上的推理方式。我们表明，一个具有单个头部的一个自我发项层概括为对象属性的新型组合。此外，我们从学习的网络中得出了变压器计算的精确数学结构。总体而言，我们提供了有关扎根的构图概括任务及其上面的变压器行为的宝贵见解，这对于在该领域工作的研究人员很有用。

Humans can reason compositionally whilst grounding language utterances to the real world. Recent benchmarks like ReaSCAN use navigation tasks grounded in a grid world to assess whether neural models exhibit similar capabilities. In this work, we present a simple transformer-based model that outperforms specialized architectures on ReaSCAN and a modified version of gSCAN. On analyzing the task, we find that identifying the target location in the grid world is the main challenge for the models. Furthermore, we show that a particular split in ReaSCAN, which tests depth generalization, is unfair. On an amended version of this split, we show that transformers can generalize to deeper input structures. Finally, we design a simpler grounded compositional generalization task, RefEx, to investigate how transformers reason compositionally. We show that a single self-attention layer with a single head generalizes to novel combinations of object attributes. Moreover, we derive a precise mathematical construction of the transformer's computations from the learned network. Overall, we provide valuable insights about the grounded compositional generalization task and the behaviour of transformers on it, which would be useful for researchers working in this area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题