论文标题
3D-OAE:在点云上进行自我监督学习的遮挡自动编码器
3D-OAE: Occlusion Auto-Encoders for Self-Supervised Learning on Point Clouds
论文作者
论文摘要
对于许多苛刻的现实世界任务,大规模点云的手册注释仍然是乏味的。在原始和未标记的数据上使用的自我监督学习是一种有前途的方法来解决此问题。现有作品通常会从自动编码器那里获得共同的援助,以建立自我重建模式的自我统计。但是,以前的自动编码器仅专注于全球形状,并且没有区分本地和全球几何特征。为了解决这个问题,我们提出了一个新颖,有效的自我监督点云表示学习框架,称为3D遮挡自动编码器(3D-OAE),以促进在本地区域和全球形状中继承的详细监督。我们建议将点云的某些局部斑块随机遮住,并通过使用其余的贴片来建立监督。具体而言,我们设计了基于标准变压器的不对称编码器架构架构,在该体系中,编码器仅在可见的贴片子集上运行以学习局部模式,而轻量级解码器旨在利用这些可见的模式来通过自我注意力来推断缺失的几何形状。我们发现,阻止一定比例的输入点云(例如75%)仍将产生非平凡的自我探索性能,这使我们能够在训练过程中快3-4倍,但也提高了准确性。实验结果表明,我们的方法在各种下游歧视性和生成性任务上都优于最先进的方法。
The manual annotation for large-scale point clouds is still tedious and unavailable for many harsh real-world tasks. Self-supervised learning, which is used on raw and unlabeled data to pre-train deep neural networks, is a promising approach to address this issue. Existing works usually take the common aid from auto-encoders to establish the self-supervision by the self-reconstruction schema. However, the previous auto-encoders merely focus on the global shapes and do not distinguish the local and global geometric features apart. To address this problem, we present a novel and efficient self-supervised point cloud representation learning framework, named 3D Occlusion Auto-Encoder (3D-OAE), to facilitate the detailed supervision inherited in local regions and global shapes. We propose to randomly occlude some local patches of point clouds and establish the supervision via inpainting the occluded patches using the remaining ones. Specifically, we design an asymmetrical encoder-decoder architecture based on standard Transformer, where the encoder operates only on the visible subset of patches to learn local patterns, and a lightweight decoder is designed to leverage these visible patterns to infer the missing geometries via self-attention. We find that occluding a very high proportion of the input point cloud (e.g. 75%) will still yield a nontrivial self-supervisory performance, which enables us to achieve 3-4 times faster during training but also improve accuracy. Experimental results show that our approach outperforms the state-of-the-art on a diverse range of downstream discriminative and generative tasks.