论文标题

保证金最佳分类树

Margin Optimal Classification Trees

论文作者

D'Onofrio, Federico, Grani, Giorgio, Monaci, Marta, Palagi, Laura

论文摘要

近年来,人们对可解释的机器学习模型的关注越来越多,可以对其行为提供解释性的见解。由于它们的可解释性,对决策树进行了深入的研究,以进行分类任务,并且由于混合整数编程(MIP)的显着进步,已经提出了各种方法来提出培训最佳分类树(OCT)作为MIP模型的问题。我们为OCT问题提供了一种新颖的混合整数二次配方,该制定利用了支持向量机的概括能力进行二进制分类。我们的模型(表示为边缘最佳分类树(MARGOT))涵盖了嵌套在二进制树结构中的最大边缘多元超重平面。为了增强我们方法的解释性,我们分析了两个替代版本的玛格特,其中包括诱导超平面系数稀疏性的特征选择约束。首先,在二维特征空间中,在非线性可分离的合成数据集上进行了测试,以提供最大边缘方法的图形表示。最后,已在UCI存储库的基准数据集上测试了所提出的模型。事实证明,玛格特公式比其他OCT方法更容易解决,并且生成的树更好地概括了新的观测值。这两个可解释的版本有效地选择了最相关的功能,以保持良好的预测质量。

In recent years, there has been growing attention to interpretable machine learning models which can give explanatory insights on their behaviour. Thanks to their interpretability, decision trees have been intensively studied for classification tasks and, due to the remarkable advances in mixed integer programming (MIP), various approaches have been proposed to formulate the problem of training an Optimal Classification Tree (OCT) as a MIP model. We present a novel mixed integer quadratic formulation for the OCT problem, which exploits the generalization capabilities of Support Vector Machines for binary classification. Our model, denoted as Margin Optimal Classification Tree (MARGOT), encompasses maximum margin multivariate hyperplanes nested in a binary tree structure. To enhance the interpretability of our approach, we analyse two alternative versions of MARGOT, which include feature selection constraints inducing sparsity of the hyperplanes' coefficients. First, MARGOT has been tested on non-linearly separable synthetic datasets in a 2-dimensional feature space to provide a graphical representation of the maximum margin approach. Finally, the proposed models have been tested on benchmark datasets from the UCI repository. The MARGOT formulation turns out to be easier to solve than other OCT approaches, and the generated tree better generalizes on new observations. The two interpretable versions effectively select the most relevant features, maintaining good prediction quality.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源