引起的自然语言理由和交错的标记令牌可以在大语言模型中推断

论文标题

引起的自然语言理由和交错的标记令牌可以在大语言模型中推断

Induced Natural Language Rationales and Interleaved Markup Tokens Enable Extrapolation in Large Language Models

论文作者

Bueno, Mirelle, Gemmell, Carlos, Dalton, Jeffrey, Lotufo, Roberto, Nogueira, Rodrigo

论文摘要

对于当前深度学习模型来说，推断出针对序列的预测的能力，即对训练示例的序列进行预测，这是一个具有挑战性的问题。最近的工作表明，这种限制仍然存在于最新的基于变压器的模型中。该问题的大多数解决方案都使用特定的体系结构或培训方法，这些方法不会推广到其他任务。我们证明，大型语言模型可以在不修改其体系结构或培训程序的情况下成功地推断。我们的实验结果表明，有效外推需要生成逐步理由和引入标记令牌。首先，我们诱导一种语言模型来逐步产生逐步的理由，然后再输出答案以有效地将任务传达给模型。但是，随着序列的更长，我们发现当前的模型难以跟踪令牌位置。为了解决这个问题，我们将输出令牌与标记令牌交织在一起，这些标记是显式位置和计数符号。我们的发现表明，这两种互补方法如何实现明显的序列外推，并突出显示当前体系结构的局限性，可以有效地概括而无需明确的表面形式指导。可在https://github.com/mirelleb/s.-rations-rations-markup-tokens上获得代码

The ability to extrapolate, i.e., to make predictions on sequences that are longer than those presented as training examples, is a challenging problem for current deep learning models. Recent work shows that this limitation persists in state-of-the-art Transformer-based models. Most solutions to this problem use specific architectures or training methods that do not generalize to other tasks. We demonstrate that large language models can succeed in extrapolation without modifying their architecture or training procedure. Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation. First, we induce a language model to produce step-by-step rationales before outputting the answer to effectively communicate the task to the model. However, as sequences become longer, we find that current models struggle to keep track of token positions. To address this issue, we interleave output tokens with markup tokens that act as explicit positional and counting symbols. Our findings show how these two complementary approaches enable remarkable sequence extrapolation and highlight a limitation of current architectures to effectively generalize without explicit surface form guidance. Code available at https://github.com/MirelleB/induced-rationales-markup-tokens

下载PDF全文

下载文献需遵守相关版权规定

论文标题