接受更艰苦的主题 - 动词协议实例接受培训的RNN可以在更轻松的情况下表现良好吗？

论文标题

接受更艰苦的主题 - 动词协议实例接受培训的RNN可以在更轻松的情况下表现良好吗？

Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones?

论文作者

Bansal, Hritik, Bhatt, Gantavya, Agarwal, Sumeet

论文摘要

先前的工作表明，接受过自然语言培训的RNN可以很好地捕获数字协议，从而对简单的句子进行良好的协议，但是当句子包含协议吸引子：介入动词与主要主题之间的名词时，其表现不佳，而语法数字与后者相反。这表明这些模型可能无法学习一致的实际语法，而是推断出较浅的启发式方法，例如“同意最近的名词”。在这项工作中，我们调查了有选择性选择的“硬”协议实例（即具有至少一个协议吸引者的句子）的归纳偏差不同的归纳偏见的RNN模型。对于这些动词，无法使用简单的线性启发式方法来预测动词数，因此它们可能有助于为层次语法提供模型的其他提示。如果RNN可以在此类艰苦的实例上培训时学习基本协议规则，那么他们应该很好地将其推广到其他句子，包括更简单的句子。但是，我们观察到，有几种RNN类型，包括具有软结构诱导偏见的ONLSTM，令人惊讶的是，仅在仅接受吸引者的句子训练时就无法在没有吸引子的句子上表现良好。我们分析了这些选择性训练的RNN与数字一致性准确性，代表性相似性和不同句法结构之间的基线（在一致性吸引子的自然分布中训练）的比较。我们的发现表明，经过对我们的艰苦协议实例进行培训的RNN仍然不会捕获一致的基本语法，而是倾向于以导致他们在“轻松”倒闭实例上表现不佳的方式过度履行培训分配。因此，尽管RNN是强大的模型，它可以拾取非平凡的依赖模式，并在语法而非表面的水平上诱导它们仍然是一个挑战。

Previous work suggests that RNNs trained on natural language corpora can capture number agreement well for simple sentences but perform less well when sentences contain agreement attractors: intervening nouns between the verb and the main subject with grammatical number opposite to the latter. This suggests these models may not learn the actual syntax of agreement, but rather infer shallower heuristics such as `agree with the recent noun'. In this work, we investigate RNN models with varying inductive biases trained on selectively chosen `hard' agreement instances, i.e., sentences with at least one agreement attractor. For these the verb number cannot be predicted using a simple linear heuristic, and hence they might help provide the model additional cues for hierarchical syntax. If RNNs can learn the underlying agreement rules when trained on such hard instances, then they should generalize well to other sentences, including simpler ones. However, we observe that several RNN types, including the ONLSTM which has a soft structural inductive bias, surprisingly fail to perform well on sentences without attractors when trained solely on sentences with attractors. We analyze how these selectively trained RNNs compare to the baseline (training on a natural distribution of agreement attractors) along the dimensions of number agreement accuracy, representational similarity, and performance across different syntactic constructions. Our findings suggest that RNNs trained on our hard agreement instances still do not capture the underlying syntax of agreement, but rather tend to overfit the training distribution in a way which leads them to perform poorly on `easy' out-of-distribution instances. Thus, while RNNs are powerful models which can pick up non-trivial dependency patterns, inducing them to do so at the level of syntax rather than surface remains a challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题