稀疏*伯特：稀疏模型推广到新任务和域

论文标题

稀疏*伯特：稀疏模型推广到新任务和域

Sparse*BERT: Sparse Models Generalize To New tasks and Domains

论文作者

Campos, Daniel, Marques, Alexandre, Nguyen, Tuan, Kurtz, Mark, Zhai, ChengXiang

论文摘要

大型语言模型已成为大多数现代自然语言处理（NLP）系统构建的核心体系结构。这些模型可以在任务和域中始终如一地提供令人印象深刻的准确性和鲁棒性，但是它们的高计算开销可能使推理变得困难和昂贵。为了使这些模型的成本降低，最近的工作探索了利用结构化和非结构化的修剪，量化和蒸馏以提高推理速度并降低尺寸。本文研究了如何使用逐渐的非结构化幅度修剪修剪的模型可以在域和任务之间转移。我们的实验表明，使用通用域掩盖语言模型在预处理过程中进行修剪的模型可以转移到新的领域和任务，而无需大量的超参数探索或专业方法。我们证明，我们的一般稀疏模型稀疏*Bert可以通过在非结构化生物医学文本上的压缩结构来缩小*BERT可以成为稀疏的Biobert。此外，我们表明Sparsebiobert可以与仅10％的参数匹配Biobert的质量。

Large Language Models have become the core architecture upon which most modern natural language processing (NLP) systems build. These models can consistently deliver impressive accuracy and robustness across tasks and domains, but their high computational overhead can make inference difficult and expensive. To make using these models less costly, recent work has explored leveraging structured and unstructured pruning, quantization, and distillation to improve inference speed and decrease size. This paper studies how models pruned using Gradual Unstructured Magnitude Pruning can transfer between domains and tasks. Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches. We demonstrate that our general sparse model Sparse*BERT can become SparseBioBERT simply by pretraining the compressed architecture on unstructured biomedical text. Moreover, we show that SparseBioBERT can match the quality of BioBERT with only 10\% of the parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题