通过上下文聚合网络实时语义细分

论文标题

通过上下文聚合网络实时语义细分

Real-time Semantic Segmentation with Context Aggregation Network

论文作者

Yang, Michael Ying, Kumaar, Saumya, Lyu, Ye, Nex, Francesco

论文摘要

随着自主系统需求的不断增长，对视觉场景理解的PixelWise语义分割不仅需要准确，而且还需要有效地用于潜在的实时应用。在本文中，我们提出了上下文聚合网络，即双分支卷积神经网络，与最先进的图案相比，计算成本明显降低，同时保持了竞争性的预测准确性。在现有的高速语义分割的现有双分支结构的基础上，我们设计了一个廉价的高分辨率分支，用于有效的空间细节，并具有带有全球聚合和本地分配块的轻重量版本的上下文分支，有效地捕获了准确的语义分割所需的远距离和本地上下文依赖性，并具有低计算上的高架。我们在两个语义分割数据集上评估了我们的方法，即CityScapes数据集和无人机数据集。对于CityScapes测试集，我们的模型以75.9％的MIOU实现最先进的结果，在NVIDIA RTX 2080TI上以76 fps的速度和Jetson Xavier NX上的8 fps达到76 fps。关于无人机数据集，我们提出的网络以高执行速度（15 fps）的MIOU得分为63.5％。

With the increasing demand of autonomous systems, pixelwise semantic segmentation for visual scene understanding needs to be not only accurate but also efficient for potential real-time applications. In this paper, we propose Context Aggregation Network, a dual branch convolutional neural network, with significantly lower computational costs as compared to the state-of-the-art, while maintaining a competitive prediction accuracy. Building upon the existing dual branch architectures for high-speed semantic segmentation, we design a cheap high resolution branch for effective spatial detailing and a context branch with light-weight versions of global aggregation and local distribution blocks, potent to capture both long-range and local contextual dependencies required for accurate semantic segmentation, with low computational overheads. We evaluate our method on two semantic segmentation datasets, namely Cityscapes dataset and UAVid dataset. For Cityscapes test set, our model achieves state-of-the-art results with mIOU of 75.9%, at 76 FPS on an NVIDIA RTX 2080Ti and 8 FPS on a Jetson Xavier NX. With regards to UAVid dataset, our proposed network achieves mIOU score of 63.5% with high execution speed (15 FPS).

下载PDF全文

下载文献需遵守相关版权规定

论文标题