Wave-san：基于小波的样式增强网络，用于跨域少数学习

论文标题

Wave-san：基于小波的样式增强网络，用于跨域少数学习

Wave-SAN: Wavelet based Style Augmentation Network for Cross-Domain Few-Shot Learning

论文作者

Fu, Yuqian, Xie, Yu, Fu, Yanwei, Chen, Jingjing, Jiang, Yu-Gang

论文摘要

以前的几次学习（FSL）的作品主要仅限于一般概念和类别的自然图像。这些作品假设源和目标类别之间的视觉相似性非常高。相比之下，最近提出的跨域少数学习（CD-FSL）旨在将知识从许多标记示例的一般性质图像转移到仅几个标记示例的新型域特异性目标类别。 CD-FSL的主要挑战在于源域和目标域之间的巨大数据移动，通常以完全不同的视觉样式的形式。这使得直接扩展经典的FSL方法以解决CD-FSL任务非常不利。为此，本文通过跨越源数据集的样式分布来研究CD-FSL的问题。特别是，引入小波变换是为了使视觉表示形式分解为诸如形状和样式和高频组件之类的低频组件，例如纹理。为了使我们的模型适合视觉样式，通过将其低频组件的样式彼此交换来增强源图像。我们提出了一种新颖的样式增强（Styleaug）模块来实现这一想法。此外，我们提出了一个自我监督的学习（SSL）模块，以确保风格杰出图像的预测在语义上与未改变的图像相似。这避免了交换样式的潜在语义漂移问题。对两个CD-FSL基准测试的广泛实验显示了我们方法的有效性。我们的代码和模型将发布。

Previous few-shot learning (FSL) works mostly are limited to natural images of general concepts and categories. These works assume very high visual similarity between the source and target classes. In contrast, the recently proposed cross-domain few-shot learning (CD-FSL) aims at transferring knowledge from general nature images of many labeled examples to novel domain-specific target categories of only a few labeled examples. The key challenge of CD-FSL lies in the huge data shift between source and target domains, which is typically in the form of totally different visual styles. This makes it very nontrivial to directly extend the classical FSL methods to address the CD-FSL task. To this end, this paper studies the problem of CD-FSL by spanning the style distributions of the source dataset. Particularly, wavelet transform is introduced to enable the decomposition of visual representations into low-frequency components such as shape and style and high-frequency components e.g., texture. To make our model robust to visual styles, the source images are augmented by swapping the styles of their low-frequency components with each other. We propose a novel Style Augmentation (StyleAug) module to implement this idea. Furthermore, we present a Self-Supervised Learning (SSL) module to ensure the predictions of style-augmented images are semantically similar to the unchanged ones. This avoids the potential semantic drift problem in exchanging the styles. Extensive experiments on two CD-FSL benchmarks show the effectiveness of our method. Our codes and models will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题