论文标题
提示多模式跟踪
Prompting for Multi-Modal Tracking
论文作者
论文摘要
由于与传统的基于RGB的跟踪相比,多模式跟踪在复杂方案中具有更准确和健壮的能力,因此引起了人们的注意。它的关键在于如何融合多模式数据并减少模式之间的差距。但是,多模式跟踪仍然严重遭受数据缺乏症的影响,从而导致融合模块的学习不足。我们没有在本文中构建这样的融合模块,而是通过将重要性附加到多模式的视觉提示中,为多模式跟踪提供了新的视角。我们设计了一种新型的多模式及时跟踪器(Protrack),可以通过及时范式将多模式输入传递到单个模态。通过最佳使用预训练的RGB跟踪器的跟踪能力,我们的突起只能通过更改输入来实现高性能多模式跟踪,即使没有对多模式数据进行任何额外的培训。 5个基准数据集的广泛实验证明了所提出的突起的有效性。
Multi-modal tracking gains attention due to its ability to be more accurate and robust in complex scenarios compared to traditional RGB-based tracking. Its key lies in how to fuse multi-modal data and reduce the gap between modalities. However, multi-modal tracking still severely suffers from data deficiency, thus resulting in the insufficient learning of fusion modules. Instead of building such a fusion module, in this paper, we provide a new perspective on multi-modal tracking by attaching importance to the multi-modal visual prompts. We design a novel multi-modal prompt tracker (ProTrack), which can transfer the multi-modal inputs to a single modality by the prompt paradigm. By best employing the tracking ability of pre-trained RGB trackers learning at scale, our ProTrack can achieve high-performance multi-modal tracking by only altering the inputs, even without any extra training on multi-modal data. Extensive experiments on 5 benchmark datasets demonstrate the effectiveness of the proposed ProTrack.