论文标题
通过切换和聚合的功能分离学习,用于基于视频的人重新识别
Feature Disentanglement Learning with Switching and Aggregation for Video-based Person Re-Identification
论文作者
论文摘要
在视频人员重新识别(RE-ID)中,网络必须从连续的框架中始终如一地提取目标人员的功能。现有方法倾向于仅关注如何使用时间信息,这通常会导致网络被类似的外观和相同的背景所欺骗。在本文中,我们提出了一个分离和切换和聚合网络(DSANET),该网络将基于相机特征的身份和功能分离,并更多地注意ID信息。我们还介绍了一项辅助任务,该任务利用通过切换和聚合创建的新功能,以提高网络在各种相机方案中的功能。此外,我们设计了一个目标定位模块(TLM),该模块可以根据框架流量和框架重量产生(FWG)提取强大的特征,以更改目标的位置,以反映最终表示中的时间信息。设计了分解学习的各种损失功能,以便网络的每个组成部分都可以合作,同时令人满意地发挥自己的作用。广泛实验的定量和定性结果表明,在三个基准数据集上,DSANET优于最先进的方法。
In video person re-identification (Re-ID), the network must consistently extract features of the target person from successive frames. Existing methods tend to focus only on how to use temporal information, which often leads to networks being fooled by similar appearances and same backgrounds. In this paper, we propose a Disentanglement and Switching and Aggregation Network (DSANet), which segregates the features representing identity and features based on camera characteristics, and pays more attention to ID information. We also introduce an auxiliary task that utilizes a new pair of features created through switching and aggregation to increase the network's capability for various camera scenarios. Furthermore, we devise a Target Localization Module (TLM) that extracts robust features against a change in the position of the target according to the frame flow and a Frame Weight Generation (FWG) that reflects temporal information in the final representation. Various loss functions for disentanglement learning are designed so that each component of the network can cooperate while satisfactorily performing its own role. Quantitative and qualitative results from extensive experiments demonstrate the superiority of DSANet over state-of-the-art methods on three benchmark datasets.