Unseennet：任何看不见概念的快速训练探测器

论文标题

Unseennet：任何看不见概念的快速训练探测器

UnseenNet: Fast Training Detector for Any Unseen Concept

论文作者

Aslam, Asra, Curry, Edward

论文摘要

目前，使用较少数据对对象检测模型进行培训是计算机视觉中现有N-Shot学习模型的重点。这样的方法使用对象级标签，并花费数小时来训练看不见的课程。在许多情况下，我们有大量的图像级标签可供培训，但很少有射击对象检测模型用于训练。需要一个机器学习框架，可以用于培训任何看不见的课程，并且在实时情况下可能会有用。在本文中，我们提出了一个“看不见的类探测器”，可以在很短的时间内训练任何可能的看不见的班级，而无需以竞争力的准确性来界定框架。我们在“强”和“弱”基线探测器上建立了方法，分别对现有对象检测和图像分类数据集进行了培训。仅使用图像级标签在强基线检测器上微调看不见的概念，并通过在基准之间传输分类器检测器知识而进一步适应。我们使用语义和视觉相似性来识别未见类（即山羊）的微调和适应源类（即绵羊）。我们的模型（Unseennet）在ImageNet分类数据集上进行了未见类的培训，并在对象检测数据集（OpenImages）上进行了测试。在不同的看不见的类拆分上，Unseennet将平均平均精度（MAP）提高了10％至30％（半监督和少数射击）。此外，每个看不见的班级的训练时间小于10分钟。定性结果表明，Unseennet不仅适合几类Pascal VOC，而且适合任何数据集或Web的看不见类别。代码可从https://github.com/asra-aslam/unseennet获得。

Training of object detection models using less data is currently the focus of existing N-shot learning models in computer vision. Such methods use object-level labels and takes hours to train on unseen classes. There are many cases where we have large amount of image-level labels available for training but cannot be utilized by few shot object detection models for training. There is a need for a machine learning framework that can be used for training any unseen class and can become useful in real-time situations. In this paper, we proposed an "Unseen Class Detector" that can be trained within a very short time for any possible unseen class without bounding boxes with competitive accuracy. We build our approach on "Strong" and "Weak" baseline detectors, which we trained on existing object detection and image classification datasets, respectively. Unseen concepts are fine-tuned on the strong baseline detector using only image-level labels and further adapted by transferring the classifier-detector knowledge between baselines. We use semantic as well as visual similarities to identify the source class (i.e. Sheep) for the fine-tuning and adaptation of unseen class (i.e. Goat). Our model (UnseenNet) is trained on the ImageNet classification dataset for unseen classes and tested on an object detection dataset (OpenImages). UnseenNet improves the mean average precision (mAP) by 10% to 30% over existing baselines (semi-supervised and few-shot) of object detection on different unseen class splits. Moreover, training time of our model is <10 min for each unseen class. Qualitative results demonstrate that UnseenNet is suitable not only for few classes of Pascal VOC but for unseen classes of any dataset or web. Code is available at https://github.com/Asra-Aslam/UnseenNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题