论文标题
关系3DMOT:从视图聚合中利用3D多对象跟踪的深度亲和力
Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation
论文作者
论文摘要
自主系统需要在3D空间中定位和跟踪周围的物体,以进行安全运动计划。结果,3D多对象跟踪(MOT)在自主导航中起着至关重要的作用。大多数MOT方法都使用逐个检测管道,其中包括对象检测和数据关联处理。但是,许多方法都检测到2D RGB序列以进行跟踪的对象,这在3D空间中定位对象时缺乏可靠性。此外,学习在不同框架中进行时间一致检测的判别特征仍然具有挑战性,并且通常从独立对象特征中学到亲和力矩阵,而无需考虑不同框架中检测到的对象之间的特征相互作用。为了解决这些问题,我们首先采用联合功能提取器来融合从2D RGB图像和3D点云中捕获的2D和3D外观特征,然后提出一个新颖的卷积操作,称为RelationConv,以更好地利用相邻框架中每个对象之间的相关性,并了解深度亲和力矩阵的进一步数据协会。我们最终提供了广泛的评估,以揭示我们提出的模型在Kitti Tracking Benchmark上实现最先进的性能。
Autonomous systems need to localize and track surrounding objects in 3D space for safe motion planning. As a result, 3D multi-object tracking (MOT) plays a vital role in autonomous navigation. Most MOT methods use a tracking-by-detection pipeline, which includes object detection and data association processing. However, many approaches detect objects in 2D RGB sequences for tracking, which is lack of reliability when localizing objects in 3D space. Furthermore, it is still challenging to learn discriminative features for temporally-consistent detection in different frames, and the affinity matrix is normally learned from independent object features without considering the feature interaction between detected objects in the different frames. To settle these problems, We firstly employ a joint feature extractor to fuse the 2D and 3D appearance features captured from both 2D RGB images and 3D point clouds respectively, and then propose a novel convolutional operation, named RelationConv, to better exploit the correlation between each pair of objects in the adjacent frames, and learn a deep affinity matrix for further data association. We finally provide extensive evaluation to reveal that our proposed model achieves state-of-the-art performance on KITTI tracking benchmark.