重新审视神经序列模型的组成概括能力

论文标题

重新审视神经序列模型的组成概括能力

Revisiting the Compositional Generalization Abilities of Neural Sequence Models

论文作者

Patel, Arkil, Bhattamishra, Satwik, Blunsom, Phil, Goyal, Navin

论文摘要

组成概括是人类的基本特征，使我们能够毫不费力地结合已知的短语，形成新的句子。最近的作品声称，标准的SEQ-to-seq模型严重缺乏构图概括的能力。在本文中，我们专注于流行的扫描基准提出的一声原始概括。我们证明，以简单和直观的方式修改训练分布使标准的SEQ-to-seq模型能够实现近乎完美的概括性能，从而表明他们的组成概括能力先前被低估了。我们对该现象进行详细的经验分析。我们的结果表明，模型的概括性能对培训数据的特征高度敏感，这些数据在将来设计此类基准时应仔细考虑。

Compositional generalization is a fundamental trait in humans, allowing us to effortlessly combine known phrases to form novel sentences. Recent works have claimed that standard seq-to-seq models severely lack the ability to compositionally generalize. In this paper, we focus on one-shot primitive generalization as introduced by the popular SCAN benchmark. We demonstrate that modifying the training distribution in simple and intuitive ways enables standard seq-to-seq models to achieve near-perfect generalization performance, thereby showing that their compositional generalization abilities were previously underestimated. We perform detailed empirical analysis of this phenomenon. Our results indicate that the generalization performance of models is highly sensitive to the characteristics of the training data which should be carefully considered while designing such benchmarks in future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题