选择推导：利用大型语言模型用于可解释的逻辑推理

论文标题

选择推导：利用大型语言模型用于可解释的逻辑推理

Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning

论文作者

Creswell, Antonia, Shanahan, Murray, Higgins, Irina

论文摘要

大型语言模型（LLM）已被证明能够对新任务具有令人印象深刻的概括。但是，它们仍然在多步逻辑推理问题上表现不佳。在这里，我们对50项探讨逻辑推理不同方面的50个任务进行了全面评估。我们表明，语言模型倾向于在单步推理或累及任务中表现出色，但要努力将多个推理步骤融合在一起以解决更复杂的问题。鉴于此，我们提出了一个选择推导（SI）框架，该框架将预先训练的LLMS作为一般处理模块，而选择和推理之间的交替，以生成一系列可解释的休闲推理步骤，从而导致最终答案。我们表明，与在10个逻辑推理任务的套件上相比，与等效的香草基线相比，SI框架中使用的7B参数LLM在没有微调的情况下使用的5杆概括设置，其性能提高了100％以上。在同一设置中，同一模型甚至超过了同一任务套件上明显更大的280B参数基线。此外，SI框架产生的答案伴随着基于因果的自然语言推理痕迹，这对系统的安全性和可信度具有重要意义。

Large language models (LLMs) have been shown to be capable of impressive few-shot generalisation to new tasks. However, they still tend to perform poorly on multi-step logical reasoning problems. Here we carry out a comprehensive evaluation of LLMs on 50 tasks that probe different aspects of logical reasoning. We show that language models tend to perform fairly well at single step inference or entailment tasks, but struggle to chain together multiple reasoning steps to solve more complex problems. In light of this, we propose a Selection-Inference (SI) framework that exploits pre-trained LLMs as general processing modules, and alternates between selection and inference to generate a series of interpretable, casual reasoning steps leading to the final answer. We show that a 7B parameter LLM used within the SI framework in a 5-shot generalisation setting, with no fine-tuning, yields a performance improvement of over 100% compared to an equivalent vanilla baseline on a suite of 10 logical reasoning tasks. The same model in the same setting even outperforms a significantly larger 280B parameter baseline on the same suite of tasks. Moreover, answers produced by the SI framework are accompanied by a causal natural-language-based reasoning trace, which has important implications for the safety and trustworthiness of the system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题