未来是不同的：大型预训练的语言模型在预测任务中失败

论文标题

未来是不同的：大型预训练的语言模型在预测任务中失败

The future is different: Large pre-trained language models fail in prediction tasks

论文作者

Cvejoski, Kostadin, Sánchez, Ramsés J., Ojeda, César

论文摘要

在下游监督任务进行微调时，大型预训练的语言模型（LPLM）已显示出惊人的成功。但是，众所周知，当训练过程中使用的数据与推理时间使用的数据之间发生分配变化时，他们的性能会大大下降。在本文中，我们着重于自然会随着时间而改变的数据分布，并介绍了四个新的Reddit数据集，即Wallstreetbets，AskScience，Donald和Politics Sub-Reddits。首先，我们从经验上证明，LPLM在预测主题分布随时间变化的子雷迪特的未来帖子的普及时，可以显示出约88％的平均性能下降（在最好的情况下！）。然后，我们引入了一种简单的方法，该方法利用神经变分的动态主题模型和注意力机制来推断回归任务的时间语言模型表示。在预测未来帖子的普及时，我们的模型在最坏情况下仅显示绩效下降约40％（最好的情况下是2％），而仅使用LPLM参数总数的7％，并提供可解释的表示形式，这些表示可以洞悉现实世界中的事件，例如GameStop Short of 2021 of 2021

Large pre-trained language models (LPLM) have shown spectacular success when fine-tuned on downstream supervised tasks. Yet, it is known that their performance can drastically drop when there is a distribution shift between the data used during training and that used at inference time. In this paper we focus on data distributions that naturally change over time and introduce four new REDDIT datasets, namely the WALLSTREETBETS, ASKSCIENCE, THE DONALD, and POLITICS sub-reddits. First, we empirically demonstrate that LPLM can display average performance drops of about 88% (in the best case!) when predicting the popularity of future posts from sub-reddits whose topic distribution changes with time. We then introduce a simple methodology that leverages neural variational dynamic topic models and attention mechanisms to infer temporal language model representations for regression tasks. Our models display performance drops of only about 40% in the worst cases (2% in the best ones) when predicting the popularity of future posts, while using only about 7% of the total number of parameters of LPLM and providing interpretable representations that offer insight into real-world events, like the GameStop short squeeze of 2021

下载PDF全文

下载文献需遵守相关版权规定

论文标题