Dylora：使用动态无搜索的低级适应性对预训练模型的参数有效调整

论文标题

Dylora：使用动态无搜索的低级适应性对预训练模型的参数有效调整

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

论文作者

Valipour, Mojtaba, Rezagholizadeh, Mehdi, Kobyzev, Ivan, Ghodsi, Ali

论文摘要

随着预期型型号（PMS）的不断增长，对它们进行微调变得更加昂贵和渴望。作为一种补救措施，低级适配器（LORA）将模型的主要审计权重冻结，只是在模型中引入了一些可学习的截短SVD模块（所谓的Lora块）。虽然洛拉块是参数效率的，但它们遇到了两个主要问题：首先，这些块的大小是固定的，无法在训练后修改（例如，如果我们需要更改Lora块的排名，那么我们需要从Scratch重新培训它们）；其次，优化其排名需要详尽的搜索和精力。在这项工作中，我们引入了动态的低级适应（Dylora）技术，以共同解决这两个问题。我们的Dylora方法通过对适配器模块在训练期间以不同等级学到的表示的表示来训练Lora块，而不是单个等级。我们使用不同尺寸的Roberta和GPT等不同验证的模型（例如不同尺寸）来评估不同自然语言理解（胶水基准）和语言生成任务（E2E，DART和WebNLG）的解决方案。我们的结果表明，我们可以比Lora更快地使用Dylora训练至少4到7次（取决于任务），而不会显着损害性能。此外，与洛拉相比，我们的模型在更大范围内的范围更大。

With the ever-growing size of pretrained models (PMs), fine-tuning them has become more expensive and resource-hungry. As a remedy, low-rank adapters (LoRA) keep the main pretrained weights of the model frozen and just introduce some learnable truncated SVD modules (so-called LoRA blocks) to the model. While LoRA blocks are parameter-efficient, they suffer from two major problems: first, the size of these blocks is fixed and cannot be modified after training (for example, if we need to change the rank of LoRA blocks, then we need to re-train them from scratch); second, optimizing their rank requires an exhaustive search and effort. In this work, we introduce a dynamic low-rank adaptation (DyLoRA) technique to address these two problems together. Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training. We evaluate our solution on different natural language understanding (GLUE benchmark) and language generation tasks (E2E, DART and WebNLG) using different pretrained models such as RoBERTa and GPT with different sizes. Our results show that we can train dynamic search-free models with DyLoRA at least 4 to 7 times (depending to the task) faster than LoRA without significantly compromising performance. Moreover, our models can perform consistently well on a much larger range of ranks compared to LoRA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题