论文标题
发现缺失的不变性原则 - 不变风险最小化的倒双
The Missing Invariance Principle Found -- the Reciprocal Twin of Invariant Risk Minimization
论文作者
论文摘要
机器学习模型通常由于依赖于训练期间与标签相关的功能而导致的,因此通常会概括到分布(OOD)数据。最近,提出了不变风险最小化(IRM)的技术来学习预测指标,仅通过保护特征条件条件的标签期望$ \ mathbb {e} _e _e [y | f(x)] $在环境中使用不变特征。但是,最近的研究表明,IRM-V1是IRM的实用版本,在各种情况下可能会失败。在这里,我们确定了导致失败的IRM配方的基本缺陷。然后,我们引入了一个互补的不变性概念MRI,基于保存标签条件的功能期望$ \ mathbb {e} _e [f(x)| y] $,这是没有此缺陷的。此外,我们引入了称为MRI-V1的MRI配方的简化,实用的版本。我们证明,对于一般线性问题,MRI-V1保证给定足够数量的环境的预测变量。我们还从经验上证明,MRI-V1在基于图像的非线性问题中始终超出IRM-V1的表现,并始终达到近乎最佳的OOD概括。
Machine learning models often generalize poorly to out-of-distribution (OOD) data as a result of relying on features that are spuriously correlated with the label during training. Recently, the technique of Invariant Risk Minimization (IRM) was proposed to learn predictors that only use invariant features by conserving the feature-conditioned label expectation $\mathbb{E}_e[y|f(x)]$ across environments. However, more recent studies have demonstrated that IRM-v1, a practical version of IRM, can fail in various settings. Here, we identify a fundamental flaw of IRM formulation that causes the failure. We then introduce a complementary notion of invariance, MRI, based on conserving the label-conditioned feature expectation $\mathbb{E}_e[f(x)|y]$, which is free of this flaw. Further, we introduce a simplified, practical version of the MRI formulation called MRI-v1. We prove that for general linear problems, MRI-v1 guarantees invariant predictors given sufficient number of environments. We also empirically demonstrate that MRI-v1 strongly out-performs IRM-v1 and consistently achieves near-optimal OOD generalization in image-based nonlinear problems.