论文标题
在复杂的结构化,多对象和自由文本注释任务中,通常衡量注释者一致性
Measuring Annotator Agreement Generally across Complex Structured, Multi-object, and Free-text Annotation Tasks
论文作者
论文摘要
当注释者标记数据时,质量保证的关键指标是通道间协议(IAA):注释者在标签上达成共识的程度。尽管对简单的分类和序数标记任务存在许多IAA措施,但相对较少的工作考虑了更复杂的标记任务,例如结构化,多对象和自由文本注释。 Krippendorff的Alpha(以更简单的标签任务使用而闻名)确实具有基于距离的公式,具有更广泛的适用性,但是很少的工作研究了其在复杂的注释任务中的功效和一致性。 我们研究了复杂注释任务的IAA措施的设计和评估,评估涵盖了七个不同的任务:图像边界框,图像关键点,文本序列标签,排名列表,自由文本翻译,数字矢量和语法树。我们确定了可解释性的难度和选择距离函数的复杂性,作为在这些任务中通常应用Krippendorff的Alpha的关键障碍。我们提出了两种新颖,更可解释的措施,表明它们在任务和注释距离功能之间产生了更一致的IAA措施。
When annotators label data, a key metric for quality assurance is inter-annotator agreement (IAA): the extent to which annotators agree on their labels. Though many IAA measures exist for simple categorical and ordinal labeling tasks, relatively little work has considered more complex labeling tasks, such as structured, multi-object, and free-text annotations. Krippendorff's alpha, best known for use with simpler labeling tasks, does have a distance-based formulation with broader applicability, but little work has studied its efficacy and consistency across complex annotation tasks. We investigate the design and evaluation of IAA measures for complex annotation tasks, with evaluation spanning seven diverse tasks: image bounding boxes, image keypoints, text sequence tagging, ranked lists, free text translations, numeric vectors, and syntax trees. We identify the difficulty of interpretability and the complexity of choosing a distance function as key obstacles in applying Krippendorff's alpha generally across these tasks. We propose two novel, more interpretable measures, showing they yield more consistent IAA measures across tasks and annotation distance functions.