金字塔 - 伯特：通过连续的基于核心的令牌选择降低复杂性

论文标题

金字塔 - 伯特：通过连续的基于核心的令牌选择降低复杂性

Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection

论文作者

Huang, Xin, Khetan, Ashish, Bidart, Rene, Karnin, Zohar

论文摘要

基于变压器的语言模型（例如BERT）已在各种NLP任务上实现了最新的性能，但在计算方面却是过时的。最近的一系列作品使用各种启发式方法来连续缩短序列长度，同时通过编码器转换令牌，例如分类和排名等任务，这些任务需要单个令牌嵌入进行预测。我们提出了一个新的解决方案，称为金字塔 - 伯特，在该问题中，我们用基于理论结果证明的基于{\ em core-set}的{\ em core-set}的代币选择方法代替了先前使用的启发式方法。基于核心的令牌选择技术使我们能够避免昂贵的预训练，提供空间有效的微调，从而使其适合处理更长的序列长度。我们提供了广泛的实验，以确定金字塔BERT优于几个基线和胶水基准和远程竞技场数据集的现有作品。

Transformer-based language models such as BERT have achieved the state-of-the-art performance on various NLP tasks, but are computationally prohibitive. A recent line of works use various heuristics to successively shorten sequence length while transforming tokens through encoders, in tasks such as classification and ranking that require a single token embedding for prediction. We present a novel solution to this problem, called Pyramid-BERT where we replace previously used heuristics with a {\em core-set} based token selection method justified by theoretical results. The core-set based token selection technique allows us to avoid expensive pre-training, gives a space-efficient fine tuning, and thus makes it suitable to handle longer sequence lengths. We provide extensive experiments establishing advantages of pyramid BERT over several baselines and existing works on the GLUE benchmarks and Long Range Arena datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题