Barack：部分监督的团体鲁棒性和保证

论文标题

Barack：部分监督的团体鲁棒性和保证

BARACK: Partially Supervised Group Robustness With Guarantees

论文作者

Sohoni, Nimit S., Sanjabi, Maziar, Ballas, Nicolas, Grover, Aditya, Nie, Shaoliang, Firooz, Hamed, Ré, Christopher

论文摘要

尽管神经网络在平均案例性能方面在分类任务上表现出了显着的成功，但它们通常无法在某些数据组上表现良好。这样的小组信息可能很昂贵；因此，即使在训练数据中不可用的小组标签，稳健性和公平性的最新著作也提出了改善最差的绩效的方法。但是，这些方法通常不足以在培训时使用组信息。在这项工作中，我们假设访问少量的组标签以及没有组标签的较大数据集。我们提出了Barack，这是一个简单的两步框架，用于利用此部分组信息来改善最差的组绩效：训练模型以预测培训数据的缺失组标签，然后以强大的优化目标使用这些预测的组标签。从理论上讲，我们根据最差的组表现为我们的方法提供了概括界限，这些绩效均相对于训练点的总数和带有小组标签的训练点的数量。从经验上讲，即使只有1-33％的积分具有组标签，我们的方法也优于不使用组信息的基准。我们提供消融研究，以支持框架的鲁棒性和可扩展性。

While neural networks have shown remarkable success on classification tasks in terms of average-case performance, they often fail to perform well on certain groups of the data. Such group information may be expensive to obtain; thus, recent works in robustness and fairness have proposed ways to improve worst-group performance even when group labels are unavailable for the training data. However, these methods generally underperform methods that utilize group information at training time. In this work, we assume access to a small number of group labels alongside a larger dataset without group labels. We propose BARACK, a simple two-step framework to utilize this partial group information to improve worst-group performance: train a model to predict the missing group labels for the training data, and then use these predicted group labels in a robust optimization objective. Theoretically, we provide generalization bounds for our approach in terms of the worst-group performance, which scale with respect to both the total number of training points and the number of training points with group labels. Empirically, our method outperforms the baselines that do not use group information, even when only 1-33% of points have group labels. We provide ablation studies to support the robustness and extensibility of our framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题