黑盒模型的可解释伴侣

论文标题

黑盒模型的可解释伴侣

Interpretable Companions for Black-Box Models

论文作者

Pan, Danqing, Wang, Tong, Hara, Satoshi

论文摘要

我们为任何预训练的黑盒分类器提供了可解释的伴侣模型。这个想法是，对于任何输入，用户可以决定从Black-Box模型中接收预测，具有很高的准确性但没有解释，或者采用伴随规则以稍低的精度获得可解释的预测。伴随模型是从数据和黑框模型的预测中训练的，其目标组合面积在透明度下 - 准确曲线和模型复杂性。我们的模型为从业人员提供了灵活的选择，他们面对始终使用可解释的模型和始终使用黑框模型来进行预测任务的难题，因此，对于任何给定的输入，用户可以退后一步，如果他们发现预测性能满足，或者如果规则不满意，则可以求助于可解释的预测。为了展示伴侣模型的价值，我们对一百多人进行了人类评估，以研究可忍受的准确性损失，从而获得了对人类的可解释性。

We present an interpretable companion model for any pre-trained black-box classifiers. The idea is that for any input, a user can decide to either receive a prediction from the black-box model, with high accuracy but no explanations, or employ a companion rule to obtain an interpretable prediction with slightly lower accuracy. The companion model is trained from data and the predictions of the black-box model, with the objective combining area under the transparency--accuracy curve and model complexity. Our model provides flexible choices for practitioners who face the dilemma of choosing between always using interpretable models and always using black-box models for a predictive task, so users can, for any given input, take a step back to resort to an interpretable prediction if they find the predictive performance satisfying, or stick to the black-box model if the rules are unsatisfying. To show the value of companion models, we design a human evaluation on more than a hundred people to investigate the tolerable accuracy loss to gain interpretability for humans.

下载PDF全文

下载文献需遵守相关版权规定

论文标题