论文标题
通过利用图像内外观相似性来有效的完整图像交互分割
Efficient Full Image Interactive Segmentation by Leveraging Within-image Appearance Similarity
论文作者
论文摘要
我们提出了一种新的方法来进行交互式完整的语义细分,该方法可以通过以前看不见的语义类别快速收集新数据集的培训数据(可以在https://youtu.be/yuk.be/yuk8d5gex-o中获得演示)。我们利用一个关键的观察:从标记到未标记像素的传播不一定需要特定于类的知识,而可以纯粹基于图像中的外观相似性来完成。我们以这种观察为基础,并提出了一种能够共同传播来自多个类别的像素标签的方法,而无需具有明确的类别外观模型。为了实现远距离传播,我们的第一个方法在整个图像中标记和未标记的像素之间的外观相似性。然后,它局部整合了每个像素测量值,从而提高了边界的准确性,并删除了均匀区域中嘈杂的标签开关。我们还设计了一个高效的手动注释界面,该接口扩展了传统的多边形绘图工具,并具有一套其他方便的功能(并在其上添加自动传播)。与可可综合综合挑战数据集中的人类注释者进行的实验表明,与多边形图相比,我们更好的手动界面和新型自动传播机制的组合导致注释时间降低了2倍以上。我们还可以在ADE-20K和Fashionista数据集上测试我们的方法,而无需进行任何数据集特定的适应性或重新训练我们的模型,证明它可以推广到新的数据集和视觉类别。
We propose a new approach to interactive full-image semantic segmentation which enables quickly collecting training data for new datasets with previously unseen semantic classes (A demo is available at https://youtu.be/yUk8D5gEX-o). We leverage a key observation: propagation from labeled to unlabeled pixels does not necessarily require class-specific knowledge, but can be done purely based on appearance similarity within an image. We build on this observation and propose an approach capable of jointly propagating pixel labels from multiple classes without having explicit class-specific appearance models. To enable long-range propagation, our approach first globally measures appearance similarity between labeled and unlabeled pixels across the entire image. Then it locally integrates per-pixel measurements which improves the accuracy at boundaries and removes noisy label switches in homogeneous regions. We also design an efficient manual annotation interface that extends the traditional polygon drawing tools with a suite of additional convenient features (and add automatic propagation to it). Experiments with human annotators on the COCO Panoptic Challenge dataset show that the combination of our better manual interface and our novel automatic propagation mechanism leads to reducing annotation time by more than factor of 2x compared to polygon drawing. We also test our method on the ADE-20k and Fashionista datasets without making any dataset-specific adaptation nor retraining our model, demonstrating that it can generalize to new datasets and visual classes.