证明对对抗性的对抗性检测到分布数据的检测

论文标题

证明对对抗性的对抗性检测到分布数据的检测

Certifiably Adversarially Robust Detection of Out-of-Distribution Data

论文作者

Bitterwolf, Julian, Meinke, Alexander, Hein, Matthias

论文摘要

当将显然不属于任何类别的输入的输入（OOD）输入中应用于分布（OOD）输入时，深层神经网络会过高自信。这是安全至关重要的应用中的一个问题，因为对分类器的不确定性的可靠评估是关键属性，从而允许系统触发人类干预或转移到安全状态。在本文中，我们旨在通过不仅在OOD点上执行较低的信心，而且还以$ L_ \ Infty $ -Ball的范围来确保OOD检测。为此，我们使用间隔约束传播（IBP）来对$ l_ \ infty $ - $ ball的最大信心，并在训练时间内最大程度地减少该上限。我们表明，在训练时间看到的OOD数据概括的OOD数据的置信度是可能的。此外，与通常在预测绩效上大幅损失的认证对抗鲁棒性相反，最坏情况下检测的经认证保证是可能的，而准确性却没有太大损失。

Deep neural networks are known to be overconfident when applied to out-of-distribution (OOD) inputs which clearly do not belong to any class. This is a problem in safety-critical applications since a reliable assessment of the uncertainty of a classifier is a key property, allowing the system to trigger human intervention or to transfer into a safe state. In this paper, we aim for certifiable worst case guarantees for OOD detection by enforcing not only low confidence at the OOD point but also in an $l_\infty$-ball around it. For this purpose, we use interval bound propagation (IBP) to upper bound the maximal confidence in the $l_\infty$-ball and minimize this upper bound during training time. We show that non-trivial bounds on the confidence for OOD data generalizing beyond the OOD dataset seen at training time are possible. Moreover, in contrast to certified adversarial robustness which typically comes with significant loss in prediction performance, certified guarantees for worst case OOD detection are possible without much loss in accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题