添加新的未观察类时，可以预测分类精度

论文标题

添加新的未观察类时，可以预测分类精度

Predicting Classification Accuracy When Adding New Unobserved Classes

论文作者

Slavutsky, Yuli, Benjamini, Yuval

论文摘要

多类分类器通常仅在最终将应用的类别中的样本上进行设计和评估。因此，他们的最终准确性仍然未知。在这项工作中，我们研究了分类器在初始类样本上的性能如何用于推断其在较大的未观察到的类别集上的预期准确性。为此，我们定义了独立于类数的正确类别和错误类之间的分离度量：“反向ROC”（RROC）是通过替换公共ROC中类和数据点的角色而获得的。我们表明，分类精度是多类分类器中RROC的函数，在添加新类时，从初始类样本中学习的数据表示不变。使用这些结果，我们制定了一种基于神经网络的强大算法“ CleanEx”，该算法学会了估计此类分类器在任意大型类中的准确性。与以前的方法不同，我们的方法同时使用了分类器的观察到的精度和分类分数的密度，因此，在模拟和实际数据集的对象检测，面部识别和大脑解码的真实数据集上，都比当前的最新方法实现了明显更好的预测。

Multiclass classifiers are often designed and evaluated only on a sample from the classes on which they will eventually be applied. Hence, their final accuracy remains unknown. In this work we study how a classifier's performance over the initial class sample can be used to extrapolate its expected accuracy on a larger, unobserved set of classes. For this, we define a measure of separation between correct and incorrect classes that is independent of the number of classes: the "reversed ROC" (rROC), which is obtained by replacing the roles of classes and data-points in the common ROC. We show that the classification accuracy is a function of the rROC in multiclass classifiers, for which the learned representation of data from the initial class sample remains unchanged when new classes are added. Using these results we formulate a robust neural-network-based algorithm, "CleaneX", which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes. Unlike previous methods, our method uses both the observed accuracies of the classifier and densities of classification scores, and therefore achieves remarkably better predictions than current state-of-the-art methods on both simulations and real datasets of object detection, face recognition, and brain decoding.

下载PDF全文

下载文献需遵守相关版权规定

论文标题