论文标题
具有动态令牌池的有效变压器
Efficient Transformers with Dynamic Token Pooling
论文作者
论文摘要
变形金刚在建模语言中实现无与伦比的性能,但在记忆和时间复杂性方面效率低下。一种可能的补救措施是通过汇总令牌的固定长度段来减少中间层中的序列长度。然而,诸如单词或短语之类的自然意义单元显示不同的大小。为了解决这一不匹配,我们为语言模型配备了动态的机制,该机制可以以自动回归方式预测细分界限。我们比较了几种推断边界的方法,包括通过随机重新参数化,监督学习(基于从子单词引导者或条件熵中的尖峰的分割)以及语言动机的边界。我们对来自多个数据集的文本和形态上多样的语言进行角色级别评估。结果表明,与Vanilla Transformers和固定的计算预算内的固定长度合并相比,共同段和模型语言的动态池既更快,更准确。
Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A possible remedy is to reduce the sequence length in the intermediate layers by pooling fixed-length segments of tokens. Nevertheless, natural units of meaning, such as words or phrases, display varying sizes. To address this mismatch, we equip language models with a dynamic-pooling mechanism, which predicts segment boundaries in an autoregressive fashion. We compare several methods to infer boundaries, including end-to-end learning through stochastic re-parameterisation, supervised learning (based on segmentations from subword tokenizers or spikes in conditional entropy), as well as linguistically motivated boundaries. We perform character-level evaluation on texts from multiple datasets and morphologically diverse languages. The results demonstrate that dynamic pooling, which jointly segments and models language, is both faster and more accurate than vanilla Transformers and fixed-length pooling within the same computational budget.