使用标签分配先验改善零拍模型

论文标题

使用标签分配先验改善零拍模型

Improving Zero-Shot Models with Label Distribution Priors

论文作者

Kahana, Jonathan, Cohen, Niv, Hoshen, Yedid

论文摘要

用面部年龄或物体类型等属性标记大图像数据集很乏味，有时是不可行的。监督的机器学习方法提供了一个高度准确的解决方案，但需要通常无法使用的手动标签。零击模型（例如剪辑）不需要手动标签，但不像监督的标签那样准确，尤其是当属性是数字时。我们提出了一种新方法，即clippr（带有先验的剪辑），该方法适应了未标记数据集上的回归和分类的零击模型。我们的方法不使用任何带注释的图像。相反，我们假设数据集中的标签分布。然后，我们在两个相互竞争的目标下训练夹子顶部的适配器网络：i）最小的预测更改原始剪辑模型II）在标签的预测和先前分布之间的最小距离。此外，我们提出了一种新的方法，用于使用分布式先验选择视觉和语言模型提示。我们的方法有效，并且对原始模型进行了重大改进。我们在UTK年龄回归任务上证明了28％的平均绝对误差提高。我们还为分类基准提出了有希望的结果，将ImageNet数据集的分类精度提高了2.83％，而无需使用任何标签。

Labeling large image datasets with attributes such as facial age or object type is tedious and sometimes infeasible. Supervised machine learning methods provide a highly accurate solution, but require manual labels which are often unavailable. Zero-shot models (e.g., CLIP) do not require manual labels but are not as accurate as supervised ones, particularly when the attribute is numeric. We propose a new approach, CLIPPR (CLIP with Priors), which adapts zero-shot models for regression and classification on unlabelled datasets. Our method does not use any annotated images. Instead, we assume a prior over the label distribution in the dataset. We then train an adapter network on top of CLIP under two competing objectives: i) minimal change of predictions from the original CLIP model ii) minimal distance between predicted and prior distribution of labels. Additionally, we present a novel approach for selecting prompts for Vision & Language models using a distributional prior. Our method is effective and presents a significant improvement over the original model. We demonstrate an improvement of 28% in mean absolute error on the UTK age regression task. We also present promising results for classification benchmarks, improving the classification accuracy on the ImageNet dataset by 2.83%, without using any labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题