感知，互动，预测：端到端运动预测的学习动态和静态线索

论文标题

感知，互动，预测：端到端运动预测的学习动态和静态线索

Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction

论文作者

Jiang, Bo, Chen, Shaoyu, Wang, Xinggang, Liao, Bencheng, Cheng, Tianheng, Chen, Jiajie, Zhou, Helong, Zhang, Qian, Liu, Wenyu, Huang, Chang

论文摘要

运动预测与自主驾驶场景中动态对象和静态图元素的感知高度相关。在这项工作中，我们提出了PIP，这是第一个基于端到端变压器的框架，该框架共同执行在线映射，对象检测和运动预测。 PIP利用地图查询，代理查询和模式查询分别编码映射元素，代理和运动意图的实例信息。基于统一的查询表示形式，提出了一种可区分的多任务相互作用方案来利用感知与预测之间的相关性。即使没有人类注销的高清图或代理商的历史跟踪轨迹作为指导信息，PIP也实现了端到端的多代理运动预测，并且比基于跟踪和基于HD-MAP的方法更好。 PIP提供了驾驶场景的全面高级信息（矢量化的静态图和带有运动信息的动态对象），并有助于下游计划和控制。代码和模型将发布以促进进一步的研究。

Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online mapping, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise information of map elements, agents and motion intentions, respectively. Based on the unified query representation, a differentiable multi-task interaction scheme is proposed to exploit the correlation between perception and prediction. Even without human-annotated HD map or agent's historical tracking trajectory as guidance information, PIP realizes end-to-end multi-agent motion prediction and achieves better performance than tracking-based and HD-map-based methods. PIP provides comprehensive high-level information of the driving scene (vectorized static map and dynamic objects with motion information), and contributes to the downstream planning and control. Code and models will be released for facilitating further research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题