基于锚固的时空注意力3D卷积网络，用于动态3D点云序列

论文标题

基于锚固的时空注意力3D卷积网络，用于动态3D点云序列

Anchor-Based Spatio-Temporal Attention 3D Convolutional Networks for Dynamic 3D Point Cloud Sequences

论文作者

Wang, Guangming, Chen, Muyao, Liu, Hanwen, Yang, Yehui, Liu, Zhe, Wang, Hesheng

论文摘要

随着测量技术的快速发展，LiDAR和DEPTH摄像机被广泛用于3D环境的感知。最新的基于学习的机器人感知方法最关注图像或视频，但是动态3D点云序列的深度学习方法却没有忽视。因此，开发与这些高级仪器兼容的有效，准确的感知方法对于自主驾驶和服务机器人至关重要。本文提出了基于锚定的时空注意3D卷积操作（ASTA3DCONV）来处理动态3D点云序列。提出的卷积操作通过在每个点周围设置几个虚拟锚来围绕每个点建立一个常规的接收场。首先根据时空注意机制将邻域点的特征汇总到每个锚点。然后，采用基于锚的3D卷积来汇总这些锚点的特征。所提出的方法可以更好地利用本地区域内的结构化信息，并从动态3D点云序列中学习时空嵌入特征。基于锚定的时空注意3D卷积神经网络（ASTA3DCNNS）是根据提议的ASTA3DCONV的分类和分割任务而建立的，并根据动作识别和语义分段任务进行了评估。关于MSRACTION3D和合成数据集的实验和消融研究表明了我们在动态3D点云序列中的卓越性能和有效性。我们的方法通过动态3D点云序列作为MSRACTION3D和SYNTHIA数据集的输入来实现方法之间的最新性能。

With the rapid development of measurement technology, LiDAR and depth cameras are widely used in the perception of the 3D environment. Recent learning based methods for robot perception most focus on the image or video, but deep learning methods for dynamic 3D point cloud sequences are underexplored. Therefore, developing efficient and accurate perception method compatible with these advanced instruments is pivotal to autonomous driving and service robots. An Anchor-based Spatio-Temporal Attention 3D Convolution operation (ASTA3DConv) is proposed in this paper to process dynamic 3D point cloud sequences. The proposed convolution operation builds a regular receptive field around each point by setting several virtual anchors around each point. The features of neighborhood points are firstly aggregated to each anchor based on the spatio-temporal attention mechanism. Then, anchor-based 3D convolution is adopted to aggregate these anchors' features to the core points. The proposed method makes better use of the structured information within the local region and learns spatio-temporal embedding features from dynamic 3D point cloud sequences. Anchor-based Spatio-Temporal Attention 3D Convolutional Neural Networks (ASTA3DCNNs) are built for classification and segmentation tasks based on the proposed ASTA3DConv and evaluated on action recognition and semantic segmentation tasks. The experiments and ablation studies on MSRAction3D and Synthia datasets demonstrate the superior performance and effectiveness of our method for dynamic 3D point cloud sequences. Our method achieves the state-of-the-art performance among the methods with dynamic 3D point cloud sequences as input on MSRAction3D and Synthia datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题