MTTRANS：与均值老师变压器的跨域对象检测

论文标题

MTTRANS：与均值老师变压器的跨域对象检测

MTTrans: Cross-Domain Object Detection with Mean-Teacher Transformer

论文作者

Yu, Jinze, Liu, Jiaming, Wei, Xiaobao, Zhou, Haoyi, Nakata, Yohei, Gudovskiy, Denis, Okuno, Tomoyuki, Li, Jianxin, Keutzer, Kurt, Zhang, Shanghang

论文摘要

最近，检测变压器（DETR）是一种端到端对象检测管道，已达到有希望的性能。但是，它需要大规模标记的数据，并遭受域移位，尤其是当目标域中没有标记的数据时。为了解决这个问题，我们根据平均教师框架MTTRANS提出了一个端到端的跨域检测变压器，该变压器可以通过伪标签完全利用对象检测训练中未标记的目标域数据和在域之间的转移知识中的传输知识。我们进一步提出了综合的多级特征对齐方式，以改善均值教师框架生成的伪标签，利用跨尺度自我注意事项机制在可变形的DETR中。图像和对象特征在本地，全局和实例级别与基于域查询的特征对齐（DQFA），BI级基于图形的原型对齐（BGPA）以及Wine-Wise图像特征对齐（TIFA）对齐。另一方面，未标记的目标域数据伪标记，可用于平均教师框架的对象检测训练，可以导致更好的特征提取和对齐。因此，可以根据变压器的架构对迭代和相互优化的平均教师框架和全面的多层次对齐。广泛的实验表明，我们提出的方法在三个领域适应方案中实现了最先进的性能，尤其是SIM10K到CityScapes方案的结果，从52.6地图提高到57.9地图。代码将发布。

Recently, DEtection TRansformer (DETR), an end-to-end object detection pipeline, has achieved promising performance. However, it requires large-scale labeled data and suffers from domain shift, especially when no labeled data is available in the target domain. To solve this problem, we propose an end-to-end cross-domain detection Transformer based on the mean teacher framework, MTTrans, which can fully exploit unlabeled target domain data in object detection training and transfer knowledge between domains via pseudo labels. We further propose the comprehensive multi-level feature alignment to improve the pseudo labels generated by the mean teacher framework taking advantage of the cross-scale self-attention mechanism in Deformable DETR. Image and object features are aligned at the local, global, and instance levels with domain query-based feature alignment (DQFA), bi-level graph-based prototype alignment (BGPA), and token-wise image feature alignment (TIFA). On the other hand, the unlabeled target domain data pseudo-labeled and available for the object detection training by the mean teacher framework can lead to better feature extraction and alignment. Thus, the mean teacher framework and the comprehensive multi-level feature alignment can be optimized iteratively and mutually based on the architecture of Transformers. Extensive experiments demonstrate that our proposed method achieves state-of-the-art performance in three domain adaptation scenarios, especially the result of Sim10k to Cityscapes scenario is remarkably improved from 52.6 mAP to 57.9 mAP. Code will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题