专家模型可扩展的转移学习

论文标题

专家模型可扩展的转移学习

Scalable Transfer Learning with Expert Models

论文作者

Puigcerver, Joan, Riquelme, Carlos, Mustafa, Basil, Renggli, Cedric, Pinto, André Susano, Gelly, Sylvain, Keysers, Daniel, Houlsby, Neil

论文摘要

预先训练的表示形式的转移可以提高样本效率并减少新任务的计算要求。但是，用于转移的表示形式通常是通用的，并且不适合下游任务的特定分布。我们探讨了使用专家表示的使用，以简单但有效的策略进行转移。我们通过利用现有标签结构来培训各种专家，并使用便宜的计算绩效代理为每个目标任务选择相关专家。该策略扩展了转移到新任务的过程，因为它不会在转移过程中重新访问预训练数据。因此，它几乎不需要每个目标任务的额外计算，并且与竞争方法相比，加速2-3个数量级。此外，我们提供基于适配器的体系结构，能够将许多专家压缩为单个模型。我们在两个不同的数据源上评估了我们的方法，并证明在这两种情况下，它都超过20多种视力任务的基准。

Transfer of pre-trained representations can improve sample efficiency and reduce computational requirements for new tasks. However, representations used for transfer are usually generic, and are not tailored to a particular distribution of downstream tasks. We explore the use of expert representations for transfer with a simple, yet effective, strategy. We train a diverse set of experts by exploiting existing label structures, and use cheap-to-compute performance proxies to select the relevant expert for each target task. This strategy scales the process of transferring to new tasks, since it does not revisit the pre-training data during transfer. Accordingly, it requires little extra compute per target task, and results in a speed-up of 2-3 orders of magnitude compared to competing approaches. Further, we provide an adapter-based architecture able to compress many experts into a single model. We evaluate our approach on two different data sources and demonstrate that it outperforms baselines on over 20 diverse vision tasks in both cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题