星：引导推理与推理

论文标题

星：引导推理与推理

STaR: Bootstrapping Reasoning With Reasoning

论文作者

Zelikman, Eric, Wu, Yuhuai, Mu, Jesse, Goodman, Noah D.

论文摘要

生成逐步的“思想链”理由改善了在数学或常识性问题（常识性问题）等复杂的推理任务上的语言模型表现。但是，当前诱导语言模型的基本原理生成需要构建大规模的理由数据集或仅使用少量射击推理来牺牲准确性。我们提出了一种技术，以迭代地利用少量的理由示例和一个没有理由的大型数据集，以引导执行更复杂的推理的能力。这种技术是“自学成才的推理器”（星），依赖一个简单的循环：生成理由来回答许多问题，并以一些理由示例提示；如果生成的答案是错误的，请再次尝试给定正确的答案生成理由；微调所有最终得出正确答案的理由；重复。我们表明，与直接预测最终答案的模型相比，Star可以显着提高多个数据集的性能，并且在CommenSensEseqa上进行了30 $ \ times $较大的最先进的语言模型。因此，Star可以通过从自己的生成的推理中学习来改善模型。

Generating step-by-step "chain-of-thought" rationales improves language model performance on complex reasoning tasks like mathematics or commonsense question-answering. However, inducing language model rationale generation currently requires either constructing massive rationale datasets or sacrificing accuracy by using only few-shot inference. We propose a technique to iteratively leverage a small number of rationale examples and a large dataset without rationales, to bootstrap the ability to perform successively more complex reasoning. This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers, and performs comparably to fine-tuning a 30$\times$ larger state-of-the-art language model on CommensenseQA. Thus, STaR lets a model improve itself by learning from its own generated reasoning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题