论文标题
压缩大规模变压器的模型:关于BERT的案例研究
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
论文作者
论文摘要
预训练的基于变压器的模型已经实现了各种自然语言处理(NLP)任务的最先进性能。但是,这些模型通常具有数十亿个参数,因此,资源耗时和计算密集度过多,无法适合具有严格延迟要求的低功能设备或应用。为此,一种潜在的补救措施是模型压缩,它引起了很多研究的关注。在这里,我们总结了压缩变压器的研究,重点是特别受欢迎的BERT模型。特别是,我们调查了BERT压缩的最新技术,我们阐明了当前压缩大型变压器模型的最佳实践,并提供了有关各种方法的工作的见解。我们的分类和分析还阐明了有希望的未来研究方向,以实现轻巧,准确和通用的NLP模型。
Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks. However, these models often have billions of parameters, and, thus, are too resource-hungry and computation-intensive to suit low-capability devices or applications with strict latency requirements. One potential remedy for this is model compression, which has attracted a lot of research attention. Here, we summarize the research in compressing Transformers, focusing on the especially popular BERT model. In particular, we survey the state of the art in compression for BERT, we clarify the current best practices for compressing large-scale Transformer models, and we provide insights into the workings of various methods. Our categorization and analysis also shed light on promising future research directions for achieving lightweight, accurate, and generic NLP models.