平衡老师：正规化批准模型以鲁棒性

论文标题

平衡老师：正规化批准模型以鲁棒性

Counterbalancing Teacher: Regularizing Batch Normalized Models for Robustness

论文作者

Taghanaki, Saeid Asgari, Gholami, Ali, Khani, Fereshte, Choi, Kristy, Tran, Linh, Zhang, Ran, Khani, Aliasghar

论文摘要

分批归一化（BN）是一种无处不在的技术，用于训练深层神经网络，可加速其收敛以达到更高的准确性。但是，我们证明了BN具有根本的缺点：它激励该模型依靠对培训（内域内）数据高度特定的低变化功能，从而损害了室外示例的概括性能。在这项工作中，我们首先表明在各种架构上删除BN层会导致较低的外域和腐败错误，而以较高的内域错误为代价，因此我们研究了这一现象。然后，我们提出了反平衡老师（CT），该方法利用与老师一起使用同一模型的冷冻副本，以通过一致性损失功能实质上调整其权重来实现学生网络对强大表示的学习。该正则化信号有助于CT在不可预见的数据变化中表现良好，即使没有从目标域中的信息如先前的工作中。从理论上讲，我们在过度参数化的线性回归设置中表明了为什么归一化导致模型对这种内域特征的依赖，并通过验证了CT的功效，通过在稳健性基准（例如CIFAR-10-C，CIFAR-100-C，CIFAR-100-C和VLC）上表现出几种基准来证明CT的功效。

Batch normalization (BN) is a ubiquitous technique for training deep neural networks that accelerates their convergence to reach higher accuracy. However, we demonstrate that BN comes with a fundamental drawback: it incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data, hurting generalization performance on out-of-domain examples. In this work, we investigate this phenomenon by first showing that removing BN layers across a wide range of architectures leads to lower out-of-domain and corruption errors at the cost of higher in-domain errors. We then propose Counterbalancing Teacher (CT), a method which leverages a frozen copy of the same model without BN as a teacher to enforce the student network's learning of robust representations by substantially adapting its weights through a consistency loss function. This regularization signal helps CT perform well in unforeseen data shifts, even without information from the target domain as in prior works. We theoretically show in an overparameterized linear regression setting why normalization leads to a model's reliance on such in-domain features, and empirically demonstrate the efficacy of CT by outperforming several baselines on robustness benchmarks such as CIFAR-10-C, CIFAR-100-C, and VLCS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题