论文标题
使用语言扩展到看不见的域
Using Language to Extend to Unseen Domains
论文作者
论文摘要
收集视觉模型部署时可能会遇到的每个可能域的培训数据很昂贵。相反,我们考虑如何简单地说出训练领域(例如“鸟类的照片”)以及我们想要扩展到但没有数据(例如“鸟类的绘画”)的域可以提高鲁棒性。我们的方法使用具有联合图像和语言嵌入空间的多模式模型,我们的方法将图像嵌入从训练域嵌入到每个看不见的测试域的转换,同时保留与任务相关的信息。在不使用看不见的测试域中的任何图像的情况下,我们表明,在包含训练和看不见的测试域的扩展域上,小伙子在针对域适应性和数据集偏置的四个基准测试套件上超过了标准的微调和集合方法。
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. "photos of birds") as well as domains we want to extend to but do not have data for (e.g. "paintings of birds") can improve robustness. Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain, while preserving task relevant information. Without using any images from the unseen test domain, we show that over the extended domain containing both training and unseen test domains, LADS outperforms standard fine-tuning and ensemble approaches over a suite of four benchmarks targeting domain adaptation and dataset bias.