论文标题
GPM:一种通用概率模型,用于恢复注释者的行为和地面真相标签
GPM: A Generic Probabilistic Model to Recover Annotator's Behavior and Ground Truth Labeling
论文作者
论文摘要
在大数据时代,可以通过众包获得数据标签。然而,获得的标签通常是嘈杂的,不可靠的甚至对抗性的。在本文中,我们提出了一个概率图形注释模型来推断基本的地面真理和注释者的行为。为了适应离散和连续的应用程序方案(例如,将场景与李克特量表分类与评分视频进行分类),遵循分布而不是单个值的基础真理。这样,可以回收“良好”注释者的可靠但潜在的不同意见。所提出的模型能够确定注释者在标签过程中是否努力地针对任务,可以用于进一步选择合格的注释者。我们的模型已经在模拟数据和现实世界数据上进行了测试,从精度和鲁棒性方面,它总是比其他最先进的模型显示出优越的性能。
In the big data era, data labeling can be obtained through crowdsourcing. Nevertheless, the obtained labels are generally noisy, unreliable or even adversarial. In this paper, we propose a probabilistic graphical annotation model to infer the underlying ground truth and annotator's behavior. To accommodate both discrete and continuous application scenarios (e.g., classifying scenes vs. rating videos on a Likert scale), the underlying ground truth is considered following a distribution rather than a single value. In this way, the reliable but potentially divergent opinions from "good" annotators can be recovered. The proposed model is able to identify whether an annotator has worked diligently towards the task during the labeling procedure, which could be used for further selection of qualified annotators. Our model has been tested on both simulated data and real-world data, where it always shows superior performance than the other state-of-the-art models in terms of accuracy and robustness.