SFNET：通过语义流更快，更准确的语义分割

论文标题

SFNET：通过语义流更快，更准确的语义分割

SFNet: Faster and Accurate Semantic Segmentation via Semantic Flow

论文作者

Li, Xiangtai, Zhang, Jiangning, Yang, Yibo, Cheng, Guangliang, Yang, Kuiyuan, Tong, Yunhai, Tao, Dacheng

论文摘要

在本文中，我们专注于探索有效的方法，以更快，准确的语义分割。提高性能的一种普遍做法是获得具有强大语义表示的高分辨率特征图。两种策略被广泛使用：严重的卷积和具有金字塔融合，而两者在计算密集程度上或无效。受到相邻视频帧之间运动对齐的光流的启发，我们提出了一个流程比对模块（FAM），以了解相邻级别的特征图和广播高级特征在高分辨率特征上的特征图之间，以有效，有效地进行高分辨率特征。此外，将我们的FAM集成到标准特征金字塔结构上，即使在轻巧的骨干网络（例如RESNET-18和DFNET）上，也表现出优于其他实时方法的性能。然后，为了进一步加快推理过程，我们还提出了一个新型的封闭式双流对齐模块，以直接将高分辨率特征图和低分辨率特征图与我们称为sfnet-lite的高分辨率特征图。广泛的实验是在几个具有挑战性的数据集上进行的，其中结果显示了SFNET和SFNET-LITE的有效性。特别是，当使用CityScapes测试集时，SFNET-LITE系列在60 fps运行时，使用RESNET-18主链和78.8 MIOU在RTX-3090上使用STDC骨架在120 fps运行时，以60 fps的速度实现80.1 MIOU。此外，我们将四个具有挑战性的驾驶数据集统一到一个大型数据集中，我们将其命名为统一的驾驶细分（UDS）数据集。它包含不同的域和样式信息。我们基准了UDS上的几项代表性作品。 SFNET和SFNET-LITE仍然可以在UDS上取得最佳的速度和准确性权衡，在这种挑战性的环境中，这是强大的基准。代码和模型可在https://github.com/lxtgh/sfsegnets上公开获取。

In this paper, we focus on exploring effective methods for faster and accurate semantic segmentation. A common practice to improve the performance is to attain high-resolution feature maps with strong semantic representation. Two strategies are widely used: atrous convolutions and feature pyramid fusion, while both are either computationally intensive or ineffective. Inspired by the Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn \textit{Semantic Flow} between feature maps of adjacent levels and broadcast high-level features to high-resolution features effectively and efficiently. Furthermore, integrating our FAM to a standard feature pyramid structure exhibits superior performance over other real-time methods, even on lightweight backbone networks, such as ResNet-18 and DFNet. Then to further speed up the inference procedure, we also present a novel Gated Dual Flow Alignment Module to directly align high-resolution feature maps and low-resolution feature maps where we term the improved version network as SFNet-Lite. Extensive experiments are conducted on several challenging datasets, where results show the effectiveness of both SFNet and SFNet-Lite. In particular, when using Cityscapes test set, the SFNet-Lite series achieve 80.1 mIoU while running at 60 FPS using ResNet-18 backbone and 78.8 mIoU while running at 120 FPS using STDC backbone on RTX-3090. Moreover, we unify four challenging driving datasets into one large dataset, which we named Unified Driving Segmentation (UDS) dataset. It contains diverse domain and style information. We benchmark several representative works on UDS. Both SFNet and SFNet-Lite still achieve the best speed and accuracy trade-off on UDS, which serves as a strong baseline in such a challenging setting. The code and models are publicly available at https://github.com/lxtGH/SFSegNets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题