论文标题
监督学习中辅助损失的自适应混合
Adaptive Mixing of Auxiliary Losses in Supervised Learning
论文作者
论文摘要
在几种监督的学习场景中,使用辅助损失,以将其他信息或约束引入监督的学习目标。例如,知识蒸馏旨在模仿强大的教师模型的输出。同样,在基于规则的方法中,标签功能可以提供弱标记信息,这些功能可能是对真实标签的基于嘈杂的规则近似。我们解决了学习以原则性方式结合这些损失的问题。我们的建议AMAL在验证数据上使用BI级优化标准,以在实例级别学习培训数据的最佳混合权重。我们描述了一种元学习方法,用于解决这一双层目标,并展示如何将其应用于监督学习中的不同情况。在许多知识蒸馏和规则降级领域进行的实验表明,Amal在这些领域中对竞争基线的增长可显着。我们通过经验分析我们的方法,并分享有关其提供性能提升的机制的见解。
In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance level, over the training data. We describe a meta-learning approach towards solving this bi-level objective and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule-denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains.