论文标题
学习用于实例实例细分的通用形状字典
Learning Universal Shape Dictionary for Realtime Instance Segmentation
论文作者
论文摘要
我们提出了一种新颖的显式形状表示,例如分割。 基于如何建模对象形状,当前实例分割系统可以分为两类,隐式和显式模型。隐式方法代表通过棘手的网络参数代表对象掩码/轮廓,并通过像素智能分类产生它是主要的。但是,较少探索了用简单且可解释的模型参数化形状的显式方法。由于生成最终形状的操作是轻度加权的,因此明确的方法比隐式方法具有明显的速度优势,这对于现实世界应用至关重要。提议的USD-SEG采用线性模型,用于对象形状的稀疏编码。 首先,它从大量的形状数据集中学习了字典,使任何形状都能通过词典分解成线性组合。 因此,名称为“通用形状词典”。 然后,它添加了一个简单的形状向量回归头到普通对象检测器,从而使检测器分割能力具有最小的开销。 为了进行定量评估,我们同时使用平均精度(AP)和AP(AP $ _e $)度量的拟议效率,该公制旨在衡量框架的计算消耗以满足现实应用程序的要求。我们报告了有关挑战性可可数据集的实验结果,其中单个泰坦XP GPU上的单个模型在65 fps时以YOLOV4为基础检测器,在65 fps时达到35.8 AP $ _E $,34.1 AP和28.6 AP $ _E $ $ _E $ $ $ _E $在12 fps的基本探测器作为基础探测器。
We present a novel explicit shape representation for instance segmentation. Based on how to model the object shape, current instance segmentation systems can be divided into two categories, implicit and explicit models. The implicit methods, which represent the object mask/contour by intractable network parameters, and produce it through pixel-wise classification, are predominant. However, the explicit methods, which parameterize the shape with simple and explainable models, are less explored. Since the operations to generate the final shape are light-weighted, the explicit methods have a clear speed advantage over implicit methods, which is crucial for real-world applications. The proposed USD-Seg adopts a linear model, sparse coding with dictionary, for object shapes. First, it learns a dictionary from a large collection of shape datasets, making any shape being able to be decomposed into a linear combination through the dictionary. Hence the name "Universal Shape Dictionary". Then it adds a simple shape vector regression head to ordinary object detector, giving the detector segmentation ability with minimal overhead. For quantitative evaluation, we use both average precision (AP) and the proposed Efficiency of AP (AP$_E$) metric, which intends to also measure the computational consumption of the framework to cater to the requirements of real-world applications. We report experimental results on the challenging COCO dataset, in which our single model on a single Titan Xp GPU achieves 35.8 AP and 27.8 AP$_E$ at 65 fps with YOLOv4 as base detector, 34.1 AP and 28.6 AP$_E$ at 12 fps with FCOS as base detector.