论文标题
GNN3DMOT:用于3D多功能学习的3D多对象跟踪的图形神经网络
GNN3DMOT: Graph Neural Network for 3D Multi-Object Tracking with Multi-Feature Learning
论文作者
论文摘要
3D多对象跟踪(MOT)对自主系统至关重要。最近的工作使用标准跟踪管道,首先针对每个对象独立执行特征提取,以计算亲和力矩阵。然后,亲和力矩阵传递给匈牙利算法以进行数据关联。该标准管道的关键过程是学习不同对象的判别特征,以减少数据关联期间的混淆。在这项工作中,我们提出了两种技术,以改善MOT的判别特征学习:(1)而不是独立地为每个对象获得特征,而是通过引入图形神经网络提出了一种新颖的特征相互作用机制。结果,一个对象的功能被告知其他对象的特征,以便对象特征可以倾斜具有相似功能的对象(即可能具有相同ID的对象),并偏离具有不同特征(即可能具有不同ID的对象)的对象,从而为每个对象提供更歧视的功能; (2)我们没有在先前的工作中从2D或3D空间中获得特征,而是提出了一种新型的关节提取器,以同时学习2D和3D空间的外观和运动特征。由于来自不同模式的特征通常具有互补的信息,因此与每种单独模式的特征相比,联合功能可以更具区分。为了确保联合特征提取器不严重依赖一种模式,我们还提出了合奏训练范式。通过广泛的评估,我们提出的方法在Kitti和Nuscenes 3D Mot基准上实现了最先进的性能。我们的代码将在https://github.com/xinshuoweng/gnn3dmot上提供
3D Multi-object tracking (MOT) is crucial to autonomous systems. Recent work uses a standard tracking-by-detection pipeline, where feature extraction is first performed independently for each object in order to compute an affinity matrix. Then the affinity matrix is passed to the Hungarian algorithm for data association. A key process of this standard pipeline is to learn discriminative features for different objects in order to reduce confusion during data association. In this work, we propose two techniques to improve the discriminative feature learning for MOT: (1) instead of obtaining features for each object independently, we propose a novel feature interaction mechanism by introducing the Graph Neural Network. As a result, the feature of one object is informed of the features of other objects so that the object feature can lean towards the object with similar feature (i.e., object probably with a same ID) and deviate from objects with dissimilar features (i.e., object probably with different IDs), leading to a more discriminative feature for each object; (2) instead of obtaining the feature from either 2D or 3D space in prior work, we propose a novel joint feature extractor to learn appearance and motion features from 2D and 3D space simultaneously. As features from different modalities often have complementary information, the joint feature can be more discriminate than feature from each individual modality. To ensure that the joint feature extractor does not heavily rely on one modality, we also propose an ensemble training paradigm. Through extensive evaluation, our proposed method achieves state-of-the-art performance on KITTI and nuScenes 3D MOT benchmarks. Our code will be made available at https://github.com/xinshuoweng/GNN3DMOT