论文标题
使用Shapley值和条件推理树解释具有混合特征的预测模型
Explaining predictive models with mixed features using Shapley values and conditional inference trees
论文作者
论文摘要
解释复杂的黑盒机器学习模型变得越来越重要。尽管关于此主题的文献有所扩展,但Shapley的价值观是一种可以从任何类型的机器学习模型中解释预测的合理方法。莎普利值的原始发展用于预测解释,依赖于所描述的特征是独立的假设。然后扩展该方法以解释具有基本连续分布的因素。在本文中,我们提出了一种通过使用条件推理树对特征的依赖性结构进行建模,以解释混合(即连续,离散,序数和分类)特征的方法。我们在各种模拟研究中证明了根据当前行业标准的提议方法,发现我们的方法通常优于其他方法。最后,我们将方法应用于2018年FICO可解释的机器学习挑战中使用的真实财务数据集,并展示了我们的解释与FICO挑战识别奖获奖团队的比较。
It is becoming increasingly important to explain complex, black-box machine learning models. Although there is an expanding literature on this topic, Shapley values stand out as a sound method to explain predictions from any type of machine learning model. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. This methodology was then extended to explain dependent features with an underlying continuous distribution. In this paper, we propose a method to explain mixed (i.e. continuous, discrete, ordinal, and categorical) dependent features by modeling the dependence structure of the features using conditional inference trees. We demonstrate our proposed method against the current industry standards in various simulation studies and find that our method often outperforms the other approaches. Finally, we apply our method to a real financial data set used in the 2018 FICO Explainable Machine Learning Challenge and show how our explanations compare to the FICO challenge Recognition Award winning team.