视觉提示调整

论文标题

视觉提示调整

Visual Prompt Tuning

论文作者

Jia, Menglin, Tang, Luming, Chen, Bor-Chun, Cardie, Claire, Belongie, Serge, Hariharan, Bharath, Lim, Ser-Nam

论文摘要

当前在改编预训练模型中的作案手法涉及更新所有骨干参数，即完整的微调。本文介绍了视觉及时调整（VPT），作为视觉中大规模变压器模型的全面微调的有效替代方案。从最近有效地调整大型语言模型的最新进展中汲取灵感，VPT仅引入了输入空间中可训练参数的少量（少于模型参数的1％），同时保持模型骨架冻结。通过对各种下游识别任务的广泛实验，我们表明VPT与其他参数有效调整协议相比，实现了显着的性能增长。最重要的是，在许多情况下，VPT甚至在模型能力和培训数据量表的许多情况下都胜过全面的微调，同时降低了每任务的存储成本。

The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题