本地典型的抽样

论文标题

本地典型的抽样

Locally Typical Sampling

论文作者

Meister, Clara, Pimentel, Tiago, Wiher, Gian, Cotterell, Ryan

论文摘要

尽管基础模型在标准指标（例如，困惑）下表现良好，但当今的概率语言发生器在产生连贯和流利的文本方面缺乏。在过去的几年中，这种差异使语言发电社区感到困惑。在这项工作中，我们认为，自然语言生成作为一个离散的随机过程（允许信息理论分析）的抽象（can）提供了有关概率语言发生器的行为的新见解，例如为什么高概率的文本会变得乏味或重复性。人类使用语言作为传达信息的手段，旨在以同时高效且误导的方式进行此信息。实际上，心理语言学研究建议人类在弦中选择每个单词，并考虑到这种潜意识的目标。我们正式定义符合此标准的字符串集：每个单词的信息内容接近预期信息内容，即我们的模型的条件熵。然后，我们提出了一个简单有效的过程，用于从概率模型生成时执行此标准，我们称之为本地典型的采样。自动和人类评估表明，与Nucleus和Top-K采样相比，本地典型的采样提供了质量的竞争性能（在抽象性摘要和故事产生中），同时始终减少退化重复。

Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the language generation community for the last few years. In this work, we posit that the abstraction of natural language generation as a discrete stochastic process--which allows for an information-theoretic analysis--can provide new insights into the behavior of probabilistic language generators, e.g., why high-probability texts can be dull or repetitive. Humans use language as a means of communicating information, aiming to do so in a simultaneously efficient and error-minimizing manner; in fact, psycholinguistics research suggests humans choose each word in a string with this subconscious goal in mind. We formally define the set of strings that meet this criterion: those for which each word has an information content close to the expected information content, i.e., the conditional entropy of our model. We then propose a simple and efficient procedure for enforcing this criterion when generating from probabilistic models, which we call locally typical sampling. Automatic and human evaluations show that, in comparison to nucleus and top-k sampling, locally typical sampling offers competitive performance (in both abstractive summarization and story generation) in terms of quality while consistently reducing degenerate repetitions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题