旨在理解思想链的提示：一项关于重要的事情的实证研究

论文标题

旨在理解思想链的提示：一项关于重要的事情的实证研究

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

论文作者

Wang, Boshi, Min, Sewon, Deng, Xiang, Shen, Jiaming, Wu, You, Zettlemoyer, Luke, Sun, Huan

论文摘要

经过思考链（COT）提示可以显着提高大语言模型（LLMS）的多步推理能力。 COT明确鼓励LLM通过在演示中提供一系列的推理步骤来生成解决问题的中间原理。尽管取得了成功，但仍然几乎没有了解什么使COT促使有效的原因以及所证明的推理步骤的哪些方面有助于其性能。在本文中，我们表明即使有无效的演示，也可能实现COT推理 - 通过无效的推理步骤提示可以实现超过80-90％的使用COT在各种指标下获得的性能，同时仍会在推理过程中生成相干推理线。进一步的实验表明，理由的其他方面，例如与查询相关并正确订购推理步骤，对于有效的COT推理更为重要。总体而言，这些发现既加深了我们对COT提示的理解，又开辟了有关LLMS在上下文中学习推理能力的新问题。

Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题