论文标题
DCANET:RGB-D语义分段的差分卷积注意网络
DCANet: Differential Convolution Attention Network for RGB-D Semantic Segmentation
论文作者
论文摘要
将RGB图像和语义分割中相应的深度图组合在一起证明了过去几年的有效性。现有的RGB-D模态融合方法要么缺乏非线性特征融合能力,要么平等地处理两个模态图像,无论固有分布差距或信息损失如何。在这里,我们发现深度图适合于由于其局部深度连续性而提供固有的细粒对象模式,而RGB图像有效地提供了全局视图。基于此,我们提出了一个像素差异卷积注意(DCA)模块,以考虑深度数据的几何信息和局部范围相关性。此外,我们将DCA扩展到集成差卷积注意力(EDCA),该卷积关注(EDCA)传播了远程依赖性,并无缝地纳入了RGB数据的空间分布。 DCA和EDCA通过像素的差异动态调整卷积权重,以分别在局部和远距离实现自适应。提议使用DCA和EDCA构建的两个分支网络,称为差异卷积网络(DCANET),以融合两种模式数据的本地和全球信息。因此,强调了RGB和深度数据的个人优势。我们的dcanet被证明可以在两个具有挑战性的基准数据集(即Nyudv2和Sun-RGBD)上为RGB-D语义细分设定新的最新性能。
Combining RGB images and the corresponding depth maps in semantic segmentation proves the effectiveness in the past few years. Existing RGB-D modal fusion methods either lack the non-linear feature fusion ability or treat both modal images equally, regardless of the intrinsic distribution gap or information loss. Here we find that depth maps are suitable to provide intrinsic fine-grained patterns of objects due to their local depth continuity, while RGB images effectively provide a global view. Based on this, we propose a pixel differential convolution attention (DCA) module to consider geometric information and local-range correlations for depth data. Furthermore, we extend DCA to ensemble differential convolution attention (EDCA) which propagates long-range contextual dependencies and seamlessly incorporates spatial distribution for RGB data. DCA and EDCA dynamically adjust convolutional weights by pixel difference to enable self-adaptive in local and long range, respectively. A two-branch network built with DCA and EDCA, called Differential Convolutional Network (DCANet), is proposed to fuse local and global information of two-modal data. Consequently, the individual advantage of RGB and depth data are emphasized. Our DCANet is shown to set a new state-of-the-art performance for RGB-D semantic segmentation on two challenging benchmark datasets, i.e., NYUDv2 and SUN-RGBD.