论文标题
倾斜的预测聚集树
Oblique Predictive Clustering Trees
论文作者
论文摘要
预测聚类树(PCT)是对标准决策树的良好概括,可用于解决各种预测建模任务,包括结构化的输出预测。将它们组合成合奏会产生最先进的性能。此外,可以通过从学习模型中计算特征的重要性得分来解释PCT的集合。但是,随着输出空间的维度,他们的学习时间缩放很差。这通常是有问题的,尤其是在(分层)多标签分类中,其中输出可以由数百个潜在标签组成。同样,PCT的学习无法利用数据的稀疏性来提高计算效率,这在输入(分子指纹,单词表示袋)和输出空间(在多标签分类中,示例通常仅在可能的标签上只有一小部分)标记)。在本文中,我们提出了能够解决这些局限性的倾斜预测聚类树。我们设计并实施了两种方法,用于学习测试中包含特征的线性组合的倾斜裂片,因此,拆分对应于输入空间中的任意超平面。这些方法对于高维数据有效,并且能够利用稀疏数据。我们通过实验评估60个基准数据集上的建议方法,以实现6个预测建模任务。实验的结果表明,倾斜的预测性聚类树通过最先进的方法可以在PAR上实现性能,并且比标准PCT快的数量级。我们还表明,有意义的特征重要性得分可以从使用建议的方法学到的模型中提取。
Predictive clustering trees (PCTs) are a well established generalization of standard decision trees, which can be used to solve a variety of predictive modeling tasks, including structured output prediction. Combining them into ensembles yields state-of-the-art performance. Furthermore, the ensembles of PCTs can be interpreted by calculating feature importance scores from the learned models. However, their learning time scales poorly with the dimensionality of the output space. This is often problematic, especially in (hierarchical) multi-label classification, where the output can consist of hundreds of potential labels. Also, learning of PCTs can not exploit the sparsity of data to improve the computational efficiency, which is common in both input (molecular fingerprints, bag of words representations) and output spaces (in multi-label classification, examples are often labeled with only a fraction of possible labels). In this paper, we propose oblique predictive clustering trees, capable of addressing these limitations. We design and implement two methods for learning oblique splits that contain linear combinations of features in the tests, hence a split corresponds to an arbitrary hyperplane in the input space. The methods are efficient for high dimensional data and capable of exploiting sparse data. We experimentally evaluate the proposed methods on 60 benchmark datasets for 6 predictive modeling tasks. The results of the experiments show that oblique predictive clustering trees achieve performance on-par with state-of-the-art methods and are orders of magnitude faster than standard PCTs. We also show that meaningful feature importance scores can be extracted from the models learned with the proposed methods.