论文标题
使用标签集分布来估算多标签精度
Estimating Multi-label Accuracy using Labelset Distributions
论文作者
论文摘要
多标签分类器估计每一组概念标签的二进制标签状态(相关与无关),对于任何给定的实例。概率的多标签分类器在此类标签状态(标签的幂列)的所有可能标签组组合(标签的功能)的所有可能的标签组组合中提供了预测性的后验分布,我们可以通过选择对应于该分布的最大预期准确性的标签集,从而提供最佳的估计值。例如,在最大化精确的匹配精度时,我们提供了分布的模式。但是,这与我们在这样的估计中可能具有信心有何关系?置信度是多标签分类器(通常在机器学习中)现实世界应用的重要组成部分,并且是解释性和可解释性的重要成分。但是,如何在多标签上下文中提供信心并与特定准确度量有关,也不清楚如何提供与期望准确性良好相关的信心,这在现实世界决策中最有价值。在本文中,我们将预期准确性视为具有给定精度度量的信心的替代品。我们假设可以从多标签的预测分布中估算预期准确性。我们检查了七个候选功能,以估计预测分布的预期准确性的能力。我们发现其中三个与预期准确性相关,并且具有稳健性。此外,我们确定可以单独使用每个候选功能来估计锤锤相似性,但是候选者的组合最适合预期的jaccard指数和精确匹配。
A multi-label classifier estimates the binary label state (relevant vs irrelevant) for each of a set of concept labels, for any given instance. Probabilistic multi-label classifiers provide a predictive posterior distribution over all possible labelset combinations of such label states (the powerset of labels) from which we can provide the best estimate, simply by selecting the labelset corresponding to the largest expected accuracy, over that distribution. For example, in maximizing exact match accuracy, we provide the mode of the distribution. But how does this relate to the confidence we may have in such an estimate? Confidence is an important element of real-world applications of multi-label classifiers (as in machine learning in general) and is an important ingredient in explainability and interpretability. However, it is not obvious how to provide confidence in the multi-label context and relating to a particular accuracy metric, and nor is it clear how to provide a confidence which correlates well with the expected accuracy, which would be most valuable in real-world decision making. In this article we estimate the expected accuracy as a surrogate for confidence, for a given accuracy metric. We hypothesise that the expected accuracy can be estimated from the multi-label predictive distribution. We examine seven candidate functions for their ability to estimate expected accuracy from the predictive distribution. We found three of these to correlate to expected accuracy and are robust. Further, we determined that each candidate function can be used separately to estimate Hamming similarity, but a combination of the candidates was best for expected Jaccard index and exact match.