雷神：挥舞锤子以整合语言模型并自动化定理掠夺者

论文标题

雷神：挥舞锤子以整合语言模型并自动化定理掠夺者

Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers

论文作者

Jiang, Albert Q., Li, Wenda, Tworkowski, Szymon, Czechowski, Konrad, Odrzygóźdź, Tomasz, Miłoś, Piotr, Wu, Yuhuai, Jamnik, Mateja

论文摘要

在定理中，从大型库中选择有用的场所以解锁给定猜想的证据至关重要。这给所有定理掠夺者，尤其是基于语言模型的掠夺者带来了一个挑战，因为它们相对无法以文本形式进行大量前提推理。本文介绍了Thor，这是一个集成语言模型的框架，并自动化了定理掠夺以克服这一困难。在Thor中，一类称为Hammers的方法，这些方法利用自动定理掠夺的力量进行前提选择，而所有其他任务都指定为语言模型。 Thor将PISA数据集中语言模型的成功率从$ 39 \％$增加到$ 57 \％$，而解决$ 8.2 \％$的问题的问题既不是语言模型，也不是自动化的定理掠夺者都能独自解决的。此外，在计算预算的情况下，Thor可以在与最佳现有方法相提并论的minif2f数据集上实现成功率。可以通过我们提供的直接协议对大多数流行的交互式定理进行实例化。

In theorem proving, the task of selecting useful premises from a large library to unlock the proof of a given conjecture is crucially important. This presents a challenge for all theorem provers, especially the ones based on language models, due to their relative inability to reason over huge volumes of premises in text form. This paper introduces Thor, a framework integrating language models and automated theorem provers to overcome this difficulty. In Thor, a class of methods called hammers that leverage the power of automated theorem provers are used for premise selection, while all other tasks are designated to language models. Thor increases a language model's success rate on the PISA dataset from $39\%$ to $57\%$, while solving $8.2\%$ of problems neither language models nor automated theorem provers are able to solve on their own. Furthermore, with a significantly smaller computational budget, Thor can achieve a success rate on the MiniF2F dataset that is on par with the best existing methods. Thor can be instantiated for the majority of popular interactive theorem provers via a straightforward protocol we provide.

下载PDF全文

下载文献需遵守相关版权规定

论文标题