论文标题
部分可观测时空混沌系统的无模型预测
Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark
论文作者
论文摘要
近年来,以视力为中心的感知在各种自主驾驶任务中蓬勃发展,包括3D检测,语义图构造,运动预测和深度估计。然而,以视力为中心的方法的延迟对于实际部署太高(例如,大多数基于相机的3D检测器的运行时大于300ms)。为了弥合理想研究和现实世界应用之间的差距,有必要量化性能和效率之间的权衡。传统上,自动驾驶感知基准测试基准执行离线评估,忽略了推理时间延迟。为了减轻问题,我们提出了自动驾驶流媒体感知(ASAP)基准,这是评估自主驾驶中以视觉感知在线表现的第一个基准。根据2Hz注释的Nuscenes数据集,我们首先提出一条延伸的管道,以生成12Hz RAW图像的高帧速率标签。在参考实际部署时,进一步构建了在约束竞争(SPUR)评估协议下的流媒体感知,其中12Hz输入用于在不同计算资源的约束下用于流评估。在ASAP基准测试中,全面的实验结果表明,模型等级在不同的约束下改变了,这表明应将模型延迟和计算预算视为设计选择以优化实际部署。为了促进进一步的研究,我们为基于摄像机的流媒体3D检测建立了基准,从而始终增强各种硬件的流式性能。 ASAP项目页面:https://github.com/jeffwang987/asap。
In recent years, vision-centric perception has flourished in various autonomous driving tasks, including 3D detection, semantic map construction, motion forecasting, and depth estimation. Nevertheless, the latency of vision-centric approaches is too high for practical deployment (e.g., most camera-based 3D detectors have a runtime greater than 300ms). To bridge the gap between ideal research and real-world applications, it is necessary to quantify the trade-off between performance and efficiency. Traditionally, autonomous-driving perception benchmarks perform the offline evaluation, neglecting the inference time delay. To mitigate the problem, we propose the Autonomous-driving StreAming Perception (ASAP) benchmark, which is the first benchmark to evaluate the online performance of vision-centric perception in autonomous driving. On the basis of the 2Hz annotated nuScenes dataset, we first propose an annotation-extending pipeline to generate high-frame-rate labels for the 12Hz raw images. Referring to the practical deployment, the Streaming Perception Under constRained-computation (SPUR) evaluation protocol is further constructed, where the 12Hz inputs are utilized for streaming evaluation under the constraints of different computational resources. In the ASAP benchmark, comprehensive experiment results reveal that the model rank alters under different constraints, suggesting that the model latency and computation budget should be considered as design choices to optimize the practical deployment. To facilitate further research, we establish baselines for camera-based streaming 3D detection, which consistently enhance the streaming performance across various hardware. ASAP project page: https://github.com/JeffWang987/ASAP.