源和容量条件下内核分类的错误缩放定律

论文标题

源和容量条件下内核分类的错误缩放定律

Error Scaling Laws for Kernel Classification under Source and Capacity Conditions

论文作者

Cui, Hugo, Loureiro, Bruno, Krzakala, Florent, Zdeborová, Lenka

论文摘要

我们考虑内核分类的问题。尽管对某些分类器的样本数量的预测误差数量的衰减率最差的范围界限，但它们通常无法准确描述真实数据集的学习曲线。在这项工作中，我们考虑了满足标准源和容量条件的重要数据集，其中包括数值显示的许多实际数据集。在高斯设计下，我们得出了错误分类（预测）误差的衰减率，这是源和容量系数的函数。对于两个标准内核分类设置，我们这样做，即边缘最大化支持向量机（SVM）和脊分类，并对比两种方法。我们发现我们的速率紧密描述了这类数据集的学习曲线，并且在实际数据上也观察到。我们的结果也可以看作是对某些真实数据集准确的内核分类规律定律指数的明确预测。

We consider the problem of kernel classification. While worst-case bounds on the decay rate of the prediction error with the number of samples are known for some classifiers, they often fail to accurately describe the learning curves of real data sets. In this work, we consider the important class of data sets satisfying the standard source and capacity conditions, comprising a number of real data sets as we show numerically. Under the Gaussian design, we derive the decay rates for the misclassification (prediction) error as a function of the source and capacity coefficients. We do so for two standard kernel classification settings, namely margin-maximizing Support Vector Machines (SVM) and ridge classification, and contrast the two methods. We find that our rates tightly describe the learning curves for this class of data sets, and are also observed on real data. Our results can also be seen as an explicit prediction of the exponents of a scaling law for kernel classification that is accurate on some real datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题