论文标题
通过决策森林的强大相似性和远程学习
Robust Similarity and Distance Learning via Decision Forests
论文作者
论文摘要
诸如欧几里得距离之类的规范距离通常无法捕获项目之间的适当关系,随后导致了Sub pars的推断和预测。已经提出了许多算法,用于自动学习合适的距离,其中大多数采用线性方法在特征空间上学习全球度量。尽管这种方法为实施它们提供了不错的理论属性,可解释性和计算有效的手段,但它们的表现能力有限。旨在提高表达性牺牲线性方法的一种或多种良好特性的方法。为了弥合这一差距,我们为远程学习的任务提出了一种高度表现力的新型决策森林算法,我们称之为相似性和公制的随机森林(SMERF)。我们表明,SMERF中的树木构建过程是对标准分类和回归树的适当概括。因此,SMERF的数学驱动力是通过与回归森林的直接联系来检查的,该林已开发出理论。在模拟数据集中,其近似任意距离和识别重要特征的能力在经验上得到了证明。最后,我们证明它准确地预测了网络中的链接。
Canonical distances such as Euclidean distance often fail to capture the appropriate relationships between items, subsequently leading to subpar inference and prediction. Many algorithms have been proposed for automated learning of suitable distances, most of which employ linear methods to learn a global metric over the feature space. While such methods offer nice theoretical properties, interpretability, and computationally efficient means for implementing them, they are limited in expressive capacity. Methods which have been designed to improve expressiveness sacrifice one or more of the nice properties of the linear methods. To bridge this gap, we propose a highly expressive novel decision forest algorithm for the task of distance learning, which we call Similarity and Metric Random Forests (SMERF). We show that the tree construction procedure in SMERF is a proper generalization of standard classification and regression trees. Thus, the mathematical driving forces of SMERF are examined via its direct connection to regression forests, for which theory has been developed. Its ability to approximate arbitrary distances and identify important features is empirically demonstrated on simulated data sets. Last, we demonstrate that it accurately predicts links in networks.