IDA-DET：1位检测器的信息差异蒸馏

论文标题

IDA-DET：1位检测器的信息差异蒸馏

IDa-Det: An Information Discrepancy-aware Distillation for 1-bit Detectors

论文作者

Xu, Sheng, Li, Yanjing, Zeng, Bohan, ma, Teli, Zhang, Baochang, Cao, Xianbin, Gao, Peng, Lv, Jinhu

论文摘要

知识蒸馏（KD）已被证明可用于训练紧凑的对象检测模型。但是，我们观察到，当教师模型和学生同行共享类似的建议信息时，KD通常是有效的。这就解释了为什么现有的KD方法对于1位检测器的有效性较小，这是由于实现的老师与1位学生之间的严重信息差异所致。本文提出了一种信息差异感知策略（IDA-DET），以提炼1位检测器，该检测器可以有效消除信息差异并大大减少1位检测器及其真实价值的对应物之间的性能差距。我们将蒸馏过程提出为双层优化公式。在内部层面，我们选择具有最大信息差异的代表性建议。然后，我们引入了一种新颖的熵蒸馏损失，以根据所选建议降低差异。广泛的实验证明了IDA-DET优于Pascal VOC和可可数据集的最先进的1位检测器和KD方法。 IDA-DET可为1位带RESNET-18骨干的1位rcnn实现76.9％的地图。我们的代码在https://github.com/stevetsui/ida-det上进行开源。

Knowledge distillation (KD) has been proven to be useful for training compact object detection models. However, we observe that KD is often effective when the teacher model and student counterpart share similar proposal information. This explains why existing KD methods are less effective for 1-bit detectors, caused by a significant information discrepancy between the real-valued teacher and the 1-bit student. This paper presents an Information Discrepancy-aware strategy (IDa-Det) to distill 1-bit detectors that can effectively eliminate information discrepancies and significantly reduce the performance gap between a 1-bit detector and its real-valued counterpart. We formulate the distillation process as a bi-level optimization formulation. At the inner level, we select the representative proposals with maximum information discrepancy. We then introduce a novel entropy distillation loss to reduce the disparity based on the selected proposals. Extensive experiments demonstrate IDa-Det's superiority over state-of-the-art 1-bit detectors and KD methods on both PASCAL VOC and COCO datasets. IDa-Det achieves a 76.9% mAP for a 1-bit Faster-RCNN with ResNet-18 backbone. Our code is open-sourced on https://github.com/SteveTsui/IDa-Det.

下载PDF全文

下载文献需遵守相关版权规定

论文标题