通过扩散的梦想分配变化不变的学习

论文标题

通过扩散的梦想分配变化不变的学习

Invariant Learning via Diffusion Dreamed Distribution Shifts

论文作者

Kattakinda, Priyatham, Levine, Alexander, Feizi, Soheil

论文摘要

尽管背景是图像分类的重要信号，但是当在测试时间破坏前景和背景之间的虚假相关性时，对其依赖的依赖可能会导致不正确的预测。在这些相关性无偏见的数据集上培训将导致更健壮的模型。在本文中，我们提出了这样的数据集，称为“扩散的分布变化”（D3S）。 D3S由使用文本提示和通过将样品前景图像粘贴到背景模板图像上获得的文本提示和图像指南生成的综合图像组成。使用这种可扩展方法，我们在10种不同背景中从所有1000个ImageNet类中生成了120K对象的图像。由于扩散模型的令人难以置信的光真相，我们的图像比以前的合成数据集更接近自然图像。 D3S包含一组超过17K图像的验证集，其标签在MTURK研究中经过人为验证。使用验证集，我们评估了几个流行的DNN图像分类器，发现模型的分类性能通常在我们的背景多样的图像上受到影响。接下来，我们利用D3中的前景和背景标签来学习一个前景（背景）表示，该表示对背景（前景）的变化是不变的，通过惩罚前景（背景）特征和背景（前景）标签之间的相互信息。在这些功能上训练以预测前景（背景）的线性分类器（背景）的精度为82.9％（93.8％），而从背景和前景预测这些标签的分类器的精度分别低于2.4％和45.6％。这表明我们的前景和背景特征是很好的。我们通过培训分类器在具有强大的虚假相关性的任务中进一步测试这些表示的疗效。

Though the background is an important signal for image classification, over reliance on it can lead to incorrect predictions when spurious correlations between foreground and background are broken at test time. Training on a dataset where these correlations are unbiased would lead to more robust models. In this paper, we propose such a dataset called Diffusion Dreamed Distribution Shifts (D3S). D3S consists of synthetic images generated through StableDiffusion using text prompts and image guides obtained by pasting a sample foreground image onto a background template image. Using this scalable approach we generate 120K images of objects from all 1000 ImageNet classes in 10 diverse backgrounds. Due to the incredible photorealism of the diffusion model, our images are much closer to natural images than previous synthetic datasets. D3S contains a validation set of more than 17K images whose labels are human-verified in an MTurk study. Using the validation set, we evaluate several popular DNN image classifiers and find that the classification performance of models generally suffers on our background diverse images. Next, we leverage the foreground & background labels in D3S to learn a foreground (background) representation that is invariant to changes in background (foreground) by penalizing the mutual information between the foreground (background) features and the background (foreground) labels. Linear classifiers trained on these features to predict foreground (background) from foreground (background) have high accuracies at 82.9% (93.8%), while classifiers that predict these labels from background and foreground have a much lower accuracy of 2.4% and 45.6% respectively. This suggests that our foreground and background features are well disentangled. We further test the efficacy of these representations by training classifiers on a task with strong spurious correlations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题