论文标题
视觉提示调整
Visual Prompt Tuning
论文作者
论文摘要
当前在改编预训练模型中的作案手法涉及更新所有骨干参数,即完整的微调。本文介绍了视觉及时调整(VPT),作为视觉中大规模变压器模型的全面微调的有效替代方案。从最近有效地调整大型语言模型的最新进展中汲取灵感,VPT仅引入了输入空间中可训练参数的少量(少于模型参数的1%),同时保持模型骨架冻结。通过对各种下游识别任务的广泛实验,我们表明VPT与其他参数有效调整协议相比,实现了显着的性能增长。最重要的是,在许多情况下,VPT甚至在模型能力和培训数据量表的许多情况下都胜过全面的微调,同时降低了每任务的存储成本。
The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, ie, full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost.