论文标题

GAM(L)A:用于解释机器学习的计量经济学模型

GAM(L)A: An econometric model for interpretable Machine Learning

论文作者

Flachaire, Emmanuel, Hacheme, Gilles, Hué, Sullivan, Laurent, Sébastien

论文摘要

尽管具有较高的预测性能,但随机森林和梯度提升通常被认为是黑匣子或不可解释的模型,这引起了从业者和监管者的关注。作为替代方案,我们在本文中建议使用固有可解释的部分线性模型。具体来说,本文介绍了Gam-Lasso(Gamla)和Gam-Autometrics(GAMA),简称为GAM(L)A。 GAM(L)A结合了参数和非参数功能,以准确捕获因变量和解释变量之间普遍存在的线性和非线性,以及用于控制过度拟合问题的变量选择程序。估计依赖于双重剩余方法的两步过程。我们说明了对回归和分类问题的GAM(L)A的预测性能和解释性。结果表明,GAM(L)优于二次,立方和相互作用效应增强的参数模型。此外,结果还表明,GAM(L)A的性能与随机森林和梯度增强的性能没有显着差异。

Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源