学会推迟多个专家：一致的替代损失，信心校准和同步合奏

论文标题

学会推迟多个专家：一致的替代损失，信心校准和同步合奏

Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles

论文作者

Verma, Rajeev, Barrejón, Daniel, Nalisnick, Eric

论文摘要

我们研究学习推迟（L2D）的统计特性。特别是，我们解决了始终如一的替代损失，信心校准以及专家的原则结合的开放问题。首先，我们得出了两个一致的替代物 - 一个基于软马克斯参数化，另一个基于一个单VS-ALL（OVA）参数化 - 分别类似于Mozannar和Sontag（2020）以及Verma和Nalisnick（2022）提出的单个专家损失。然后，我们研究框架估计P（M_J = Y | X）的能力，即JTH专家将正确预测X标签的可能性。理论表明，基于软马克斯的损失导致错误的校准在估计值之间传播，而基于OVA的损失则没有（尽管实际上，我们发现有贸易折扣）。最后，我们提出了一种保形推理技术，该技术选择一部分专家在系统辩护时查询。我们对银河系，皮肤病变和仇恨言语分类的任务进行经验验证。

We study the statistical properties of learning to defer (L2D) to multiple experts. In particular, we address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts. Firstly, we derive two consistent surrogates -- one based on a softmax parameterization, the other on a one-vs-all (OvA) parameterization -- that are analogous to the single expert losses proposed by Mozannar and Sontag (2020) and Verma and Nalisnick (2022), respectively. We then study the frameworks' ability to estimate P( m_j = y | x ), the probability that the jth expert will correctly predict the label for x. Theory shows the softmax-based loss causes mis-calibration to propagate between the estimates while the OvA-based loss does not (though in practice, we find there are trade offs). Lastly, we propose a conformal inference technique that chooses a subset of experts to query when the system defers. We perform empirical validation on tasks for galaxy, skin lesion, and hate speech classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题