论文标题
用嘈杂的标签学习不平衡的亚群
Learning with Noisy Labels over Imbalanced Subpopulations
论文作者
论文摘要
嘈杂的标签(LNL)学习吸引了研究界的重大关注。许多最近的LNL方法依赖于这样的假设,即干净的样品往往具有“小损失”。但是,此假设总是无法概括为某些现实世界中的亚群体不平衡的情况,即训练亚群在样本量或识别难度方面有所不同。因此,最近的LNL方法面临将这些“信息性”样本(例如,尾部亚群中的硬样品或样品)分为嘈杂的样本的风险,导致概括性能差。 为了解决上述问题,我们提出了一种新型的LNL方法,以同时处理嘈杂的标签和不平衡的亚群。首先,它利用样品相关性来估算样品的清洁概率进行标签校正,然后利用校正的标签进行分布稳健优化(DRO),以进一步提高鲁棒性。具体而言,与以前使用分类损失作为选择标准的工作相反,我们引入了一个基于功能的指标,该指标将样本相关性考虑到估计样品的清洁概率。然后,我们使用模型预测中的估计清洁概率和伪标签来翻新嘈杂的标签。使用翻新的标签,我们使用DRO来训练模型以鲁棒性以使群体失衡。对广泛的基准进行了广泛的实验表明,我们的技术可以始终如一地改善当前最新的鲁棒学习范式针对嘈杂的标签,尤其是在遇到不平衡的亚群时。
Learning with Noisy Labels (LNL) has attracted significant attention from the research community. Many recent LNL methods rely on the assumption that clean samples tend to have "small loss". However, this assumption always fails to generalize to some real-world cases with imbalanced subpopulations, i.e., training subpopulations varying in sample size or recognition difficulty. Therefore, recent LNL methods face the risk of misclassifying those "informative" samples (e.g., hard samples or samples in the tail subpopulations) into noisy samples, leading to poor generalization performance. To address the above issue, we propose a novel LNL method to simultaneously deal with noisy labels and imbalanced subpopulations. It first leverages sample correlation to estimate samples' clean probabilities for label correction and then utilizes corrected labels for Distributionally Robust Optimization (DRO) to further improve the robustness. Specifically, in contrast to previous works using classification loss as the selection criterion, we introduce a feature-based metric that takes the sample correlation into account for estimating samples' clean probabilities. Then, we refurbish the noisy labels using the estimated clean probabilities and the pseudo-labels from the model's predictions. With refurbished labels, we use DRO to train the model to be robust to subpopulation imbalance. Extensive experiments on a wide range of benchmarks demonstrate that our technique can consistently improve current state-of-the-art robust learning paradigms against noisy labels, especially when encountering imbalanced subpopulations.