论文标题
端到端人搜索的级联变压器
Cascade Transformers for End-to-End Person Search
论文作者
论文摘要
人搜索的目的是从画廊的场景图像集定位目标人员,这由于大规模变化,姿势/观点的变化和遮挡非常具有挑战性。在本文中,我们提出了cascad的闭塞注意力变压器(外套),用于端到端的人搜索。我们的三阶段级联设计着重于在第一阶段检测人员,而后来又逐步完善了人发现和重新识别的表示形式。在每个阶段,被阻塞的注意力变压器在联合阈值上应用了更紧密的交叉点,迫使网络学习粗到最佳的姿势/比例不变特征。同时,我们计算每个检测的关注,以将一个人的令牌与其他人或背景区分开来。通过这种方式,我们模拟了其他物体在令牌级别上阻塞一个感兴趣的人的效果。通过全面的实验,我们通过在两个基准数据集上实现最先进的性能来证明我们方法的好处。
The goal of person search is to localize a target person from a gallery set of scene images, which is extremely challenging due to large scale variations, pose/viewpoint changes, and occlusions. In this paper, we propose the Cascade Occluded Attention Transformer (COAT) for end-to-end person search. Our three-stage cascade design focuses on detecting people in the first stage, while later stages simultaneously and progressively refine the representation for person detection and re-identification. At each stage the occluded attention transformer applies tighter intersection over union thresholds, forcing the network to learn coarse-to-fine pose/scale invariant features. Meanwhile, we calculate each detection's occluded attention to differentiate a person's tokens from other people or the background. In this way, we simulate the effect of other objects occluding a person of interest at the token-level. Through comprehensive experiments, we demonstrate the benefits of our method by achieving state-of-the-art performance on two benchmark datasets.