论文标题
及时匹配的语义细分
Prompt-Matched Semantic Segmentation
论文作者
论文摘要
这项工作的目的是探索如何有效,有效地将预训练的视觉基础模型适应各种语义分割的任务。以前的方法通常对每个特定数据集进行了整个网络,这将是存储这些网络大量参数的繁重的。最近的一些作品试图将一些额外的可训练参数插入冷冻网络中,以学习视觉提示以进行参数有效调整。但是,这些作品表现出较差的通用性,因为它们是专门为变压器设计的。此外,使用这些方案中有限的信息,它们表现出较差的学习有益提示的能力。为了减轻这些问题,我们提出了一个新颖的舞台及时匹配框架,以进行通用有效的视觉及时调整。具体来说,为了确保普遍性,我们将带有冷冻参数的预训练的主链分为多个阶段,并在不同阶段之间进行及时的学习,这使得建议的方案适用于CNN和Transformer的各种体系结构。为了进行有效的调整,轻巧的语义意识及时匹配器(SPM)旨在通过经常性机制逐步学习合理的提示,并在临时语义图的丰富信息的指导下进行。提出的SPM作为代表学习的深度匹配过滤器,可以很好地将上一阶段的输出转变为下一个阶段的理想输入,从而实现了预训练的知识的更好匹配/刺激。对四个基准测试的广泛实验表明,所提出的方案可以实现参数效率和性能效率之间的有希望的权衡。我们的代码和模型将发布。
The objective of this work is to explore how to effectively and efficiently adapt pre-trained visual foundation models to various downstream tasks of semantic segmentation. Previous methods usually fine-tuned the entire networks for each specific dataset, which will be burdensome to store massive parameters of these networks. A few recent works attempted to insert some extra trainable parameters into the frozen networks to learn visual prompts for parameter-efficient tuning. However, these works showed poor generality as they were designed specifically for Transformers. Moreover, using limited information in these schemes, they exhibited a poor capacity to learn beneficial prompts. To alleviate these issues, we propose a novel Stage-wise Prompt-Matched Framework for generic and effective visual prompt tuning. Specifically, to ensure generality, we divide the pre-trained backbone with frozen parameters into multiple stages and perform prompt learning between different stages, which makes the proposed scheme applicable to various architectures of CNN and Transformer. For effective tuning, a lightweight Semantic-aware Prompt Matcher (SPM) is designed to progressively learn reasonable prompts with a recurrent mechanism, guided by the rich information of interim semantic maps. Working as deep matched filter of representation learning, the proposed SPM can well transform the output of the previous stage into a desirable input for the next stage, thus achieving the better matching/stimulating for the pre-trained knowledge. Extensive experiments on four benchmarks demonstrate that the proposed scheme can achieve a promising trade-off between parameter efficiency and performance effectiveness. Our code and models will be released.