论文标题
ROURUSTLR:评估演绎推理中逻辑扰动的鲁棒性
RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning
论文作者
论文摘要
变形金刚已被证明能够在包含用英语自然语言的规则和陈述的逻辑规则基础上执行演绎推理。尽管进度是有希望的,但目前尚不清楚这些模型是否确实通过理解语言中的基本逻辑语义来执行逻辑推理。为此,我们提出了Rubustlr,这是一套评估数据集,可评估这些模型的鲁棒性,以最小化规则基础和某些标准的逻辑等效条件。在我们对Roberta和T5的实验中,我们发现在鲁棒中训练的模型在鲁棒的不同扰动上并未持续执行,因此表明该模型对所提出的逻辑扰动并不强大。此外,我们发现模型发现学习逻辑否定和分离运算符特别困难。总体而言,使用我们的评估集,我们证明了基于推理的语言模型的一些缺点,这些模型最终可以帮助设计更好的模型,以实现自然语言的逻辑推理。所有数据集和代码库均已公开可用。
Transformers have been shown to be able to perform deductive reasoning on a logical rulebase containing rules and statements written in English natural language. While the progress is promising, it is currently unclear if these models indeed perform logical reasoning by understanding the underlying logical semantics in the language. To this end, we propose RobustLR, a suite of evaluation datasets that evaluate the robustness of these models to minimal logical edits in rulebases and some standard logical equivalence conditions. In our experiments with RoBERTa and T5, we find that the models trained in prior works do not perform consistently on the different perturbations in RobustLR, thus showing that the models are not robust to the proposed logical perturbations. Further, we find that the models find it especially hard to learn logical negation and disjunction operators. Overall, using our evaluation sets, we demonstrate some shortcomings of the deductive reasoning-based language models, which can eventually help towards designing better models for logical reasoning over natural language. All the datasets and code base have been made publicly available.