通过特征空间投影指导的半自动数据注释

论文标题

通过特征空间投影指导的半自动数据注释

Semi-Automatic Data Annotation guided by Feature Space Projection

论文作者

Benato, Barbara Caroline, Gomes, Jancarlo Ferreira, Telea, Alexandru Cristian, Falcão, Alexandre Xavier

论文摘要

使用每个培训样本的视觉检查（监督）进行数据注释可能会很费力。交互式解决方案通过帮助专家传播一些监督样本的标签来减轻这种方法，从而仅根据其特征空间投影的目视分析（没有进一步的样本监督），从而无标记的样本。我们提出了一种基于适当特征空间投影和半监督标签估计的半自动数据注释方法。我们在流行的MNIST数据集以及具有和没有粪便杂质的人类肠道寄生虫的图像上验证了我们的方法，这是一个大而多样的数据集，使分类非常困难。我们评估了从潜在和投影空间中进行半监督学习的两种方法，以选择一种最能减少用户注释工作并提高看不见数据的分类精度的方法。我们的结果证明了视觉分析工具的附加价值，这些工具结合了人类和机器的互补能力，以实现更有效的机器学习。

Data annotation using visual inspection (supervision) of each training sample can be laborious. Interactive solutions alleviate this by helping experts propagate labels from a few supervised samples to unlabeled ones based solely on the visual analysis of their feature space projection (with no further sample supervision). We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation. We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities, a large and diverse dataset that makes classification very hard. We evaluate two approaches for semi-supervised learning from the latent and projection spaces, to choose the one that best reduces user annotation effort and also increases classification accuracy on unseen data. Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题