论文标题

使用标签分配先验改善零拍模型

Improving Zero-Shot Models with Label Distribution Priors

论文作者

Kahana, Jonathan, Cohen, Niv, Hoshen, Yedid

论文摘要

用面部年龄或物体类型等属性标记大图像数据集很乏味,有时是不可行的。监督的机器学习方法提供了一个高度准确的解决方案,但需要通常无法使用的手动标签。零击模型(例如剪辑)不需要手动标签,但不像监督的标签那样准确,尤其是当属性是数字时。我们提出了一种新方法,即clippr(带有先验的剪辑),该方法适应了未标记数据集上的回归和分类的零击模型。我们的方法不使用任何带注释的图像。相反,我们假设数据集中的标签分布。然后,我们在两个相互竞争的目标下训练夹子顶部的适配器网络:i)最小的预测更改原始剪辑模型II)在标签的预测和先前分布之间的最小距离。此外,我们提出了一种新的方法,用于使用分布式先验选择视觉和语言模型提示。我们的方法有效,并且对原始模型进行了重大改进。我们在UTK年龄回归任务上证明了28%的平均绝对误差提高。我们还为分类基准提出了有希望的结果,将ImageNet数据集的分类精度提高了2.83%,而无需使用任何标签。

Labeling large image datasets with attributes such as facial age or object type is tedious and sometimes infeasible. Supervised machine learning methods provide a highly accurate solution, but require manual labels which are often unavailable. Zero-shot models (e.g., CLIP) do not require manual labels but are not as accurate as supervised ones, particularly when the attribute is numeric. We propose a new approach, CLIPPR (CLIP with Priors), which adapts zero-shot models for regression and classification on unlabelled datasets. Our method does not use any annotated images. Instead, we assume a prior over the label distribution in the dataset. We then train an adapter network on top of CLIP under two competing objectives: i) minimal change of predictions from the original CLIP model ii) minimal distance between predicted and prior distribution of labels. Additionally, we present a novel approach for selecting prompts for Vision & Language models using a distributional prior. Our method is effective and presents a significant improvement over the original model. We demonstrate an improvement of 28% in mean absolute error on the UTK age regression task. We also present promising results for classification benchmarks, improving the classification accuracy on the ImageNet dataset by 2.83%, without using any labels.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源