混音：深度学习中的班级失衡的重新采样

论文标题

混音：深度学习中的班级失衡的重新采样

ReMix: Calibrated Resampling for Class Imbalance in Deep learning

论文作者

Bellinger, Colin, Corizzo, Roberto, Japkowicz, Nathalie

论文摘要

阶级失衡是在应用深度学习中具有重要意义的一个问题，在这种深度学习中，训练有素的模型被利用以进行决策支持和自动化决策，例如健康和医学，运输和金融。从不平衡培训数据中学习深层模型的挑战仍然很高，最新的解决方案通常取决于数据，主要集中在图像数据上。但是，现实世界中的不平衡分类问题要多样化得多，因此需要一个可以应用于表格，图像和文本数据的通用解决方案。在本文中，我们提出了混音，这是一种培训技术，利用批处理重新采样，实例混合和软标签，以促进强大的深层模型，以实现不平衡的学习。我们的结果表明，经过混音训练的密集网和CNN通常按照G-均值优于替代方案，并且根据平衡的Brier得分进行了更好的校准。

Class imbalance is a problem of significant importance in applied deep learning where trained models are exploited for decision support and automated decisions in critical areas such as health and medicine, transportation, and finance. The challenge of learning deep models from imbalanced training data remains high, and the state-of-the-art solutions are typically data dependent and primarily focused on image data. Real-world imbalanced classification problems, however, are much more diverse thus necessitating a general solution that can be applied to tabular, image and text data. In this paper, we propose ReMix, a training technique that leverages batch resampling, instance mixing and soft-labels to enable the induction of robust deep models for imbalanced learning. Our results show that dense nets and CNNs trained with ReMix generally outperform the alternatives according to the g-mean and are better calibrated according to the balanced Brier score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题