结构化预测的合奏蒸馏：校准，准确，快速选择三个

论文标题

结构化预测的合奏蒸馏：校准，准确，快速选择三个

Ensemble Distillation for Structured Prediction: Calibrated, Accurate, Fast-Choose Three

论文作者

Reich, Steven, Mueller, David, Andrews, Nicholas

论文摘要

即使经过适当的评分函数（例如跨凝结），现代神经网络并不总是会产生良好的预测。在分类设置中，可以将简单的方法（例如等渗回归或温度缩放）与固定数据集一起使用以校准模型输出。但是，将这些方法扩展到结构化预测并不总是直接或有效的。此外，可能并不总是可用。在本文中，我们将整体蒸馏研究是一种一般框架，用于产生良好的结构化预测模型，同时避免合奏的过度推理时间成本。我们在两个任务上验证了此框架：命名实体识别和机器翻译。我们发现，在这两个任务中，整体蒸馏都会产生模型，这些模型保留了合奏的大部分，并有时会改善合奏的性能和校准优势，而在测试时间内仅需要一个模型。

Modern neural networks do not always produce well-calibrated predictions, even when trained with a proper scoring function such as cross-entropy. In classification settings, simple methods such as isotonic regression or temperature scaling may be used in conjunction with a held-out dataset to calibrate model outputs. However, extending these methods to structured prediction is not always straightforward or effective; furthermore, a held-out calibration set may not always be available. In this paper, we study ensemble distillation as a general framework for producing well-calibrated structured prediction models while avoiding the prohibitive inference-time cost of ensembles. We validate this framework on two tasks: named-entity recognition and machine translation. We find that, across both tasks, ensemble distillation produces models which retain much of, and occasionally improve upon, the performance and calibration benefits of ensembles, while only requiring a single model during test-time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题