TR-MOT：通过参考进行多目标跟踪

论文标题

TR-MOT：通过参考进行多目标跟踪

TR-MOT: Multi-Object Tracking by Reference

论文作者

Chen, Mingfei, Liao, Yue, Liu, Si, Wang, Fei, Hwang, Jenq-Neng

论文摘要

多对象跟踪（MOT）通常可以分为两个子任务，即检测和关联。以前的许多方法遵循检测范式跟踪，该范式首先在每个帧处获得检测，然后将它们关联到相邻帧之间。尽管具有令人印象深刻的性能通过使用强探测器，但它会在许多阻塞和不使用时间信息的场景下降低其检测和关联性能。在本文中，我们提出了一个新颖的参考搜索（RS）模块，以基于可变形的变压器结构提供更可靠的关联，这很自然地学习框架之间每个对象的特征对齐。 RS以先前的检测结果为参考，以从相邻帧的组合特征汇总相应的特征，并对每个参考并行做出一对一的轨道状态预测。因此，RS可以通过利用视觉时间特征来实现与意外动作的可靠关联，同时通过与检测器解耦来保持强大的检测性能。我们的RS模块也可以与通过检测框架的其他跟踪结构兼容。此外，我们提出了一种联合培训策略和与RS模块的在线MOT框架的有效匹配管道。我们的方法在MOT17和MOT20数据集上取得了竞争性结果。

Multi-object Tracking (MOT) generally can be split into two sub-tasks, i.e., detection and association. Many previous methods follow the tracking by detection paradigm, which first obtain detections at each frame and then associate them between adjacent frames. Though with an impressive performance by utilizing a strong detector, it will degrade their detection and association performance under scenes with many occlusions and large motion if not using temporal information. In this paper, we propose a novel Reference Search (RS) module to provide a more reliable association based on the deformable transformer structure, which is natural to learn the feature alignment for each object among frames. RS takes previous detected results as references to aggregate the corresponding features from the combined features of the adjacent frames and makes a one-to-one track state prediction for each reference in parallel. Therefore, RS can attain a reliable association coping with unexpected motions by leveraging visual temporal features while maintaining the strong detection performance by decoupling from the detector. Our RS module can also be compatible with the structure of the other tracking by detection frameworks. Furthermore, we propose a joint training strategy and an effective matching pipeline for our online MOT framework with the RS module. Our method achieves competitive results on MOT17 and MOT20 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题