占用者：自我监督的预训练大规模激光点云，带有蒙版占用自动编码器

论文标题

占用者：自我监督的预训练大规模激光点云，带有蒙版占用自动编码器

Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders

论文作者

Min, Chen, Xu, Xinli, Zhao, Dawei, Xiao, Liang, Nie, Yiming, Dai, Bin

论文摘要

当前自动驾驶中的当前感知模型在很大程度上依赖于标有3D数据的大规模，这既昂贵又耗时。这项工作提出了一种解决方案，以利用蒙版自动编码器（MAE）对大规模未标记的户外激光雷达点云进行预训练，以减少对标记的3D训练数据的依赖。虽然现有的掩盖点自动编码方法主要集中于小规模的室内点云或基于支柱的大规模室外激光雷达数据，但我们的方法引入了一种新的自助式掩盖的掩盖占用预训练方法，称为占用率，专门为Voxel基于Voxel基于Voxel的大型大型大型户外Lidar Lidar Points counds设计。占用者利用室外激光点云的逐渐稀疏的体素占用结构，并结合了范围感知的随机掩盖策略和借口的占用预测任务。通过基于与激光雷达的距离随机掩盖体素，并预测整个3D周围场景的掩盖占用结构，占用率占用占用，鼓励提取高级语义信息，以仅使用少量可见素的素质来重建蒙面的体素。广泛的实验证明了在几个下游任务中占用率的有效性。对于3D对象检测，占用MAE将KITTI数据集中的CAR检测所需的标记数据减少了一半，并将小对象检测提高了Waymo数据集中的AP大约2％。对于3D语义细分，MIOU的占用率胜率优于从头开始的2％。对于多对象跟踪，就AMOTA和AMOTP而言，占用率为MAE可增强大约1％的训练。代码可在https://github.com/chaytonmin/occupancy-mae上公开获得。

Current perception models in autonomous driving heavily rely on large-scale labelled 3D data, which is both costly and time-consuming to annotate. This work proposes a solution to reduce the dependence on labelled 3D training data by leveraging pre-training on large-scale unlabeled outdoor LiDAR point clouds using masked autoencoders (MAE). While existing masked point autoencoding methods mainly focus on small-scale indoor point clouds or pillar-based large-scale outdoor LiDAR data, our approach introduces a new self-supervised masked occupancy pre-training method called Occupancy-MAE, specifically designed for voxel-based large-scale outdoor LiDAR point clouds. Occupancy-MAE takes advantage of the gradually sparse voxel occupancy structure of outdoor LiDAR point clouds and incorporates a range-aware random masking strategy and a pretext task of occupancy prediction. By randomly masking voxels based on their distance to the LiDAR and predicting the masked occupancy structure of the entire 3D surrounding scene, Occupancy-MAE encourages the extraction of high-level semantic information to reconstruct the masked voxel using only a small number of visible voxels. Extensive experiments demonstrate the effectiveness of Occupancy-MAE across several downstream tasks. For 3D object detection, Occupancy-MAE reduces the labelled data required for car detection on the KITTI dataset by half and improves small object detection by approximately 2% in AP on the Waymo dataset. For 3D semantic segmentation, Occupancy-MAE outperforms training from scratch by around 2% in mIoU. For multi-object tracking, Occupancy-MAE enhances training from scratch by approximately 1% in terms of AMOTA and AMOTP. Codes are publicly available at https://github.com/chaytonmin/Occupancy-MAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题