论文标题
通过域转换器适应无监督的域名
Towards Unsupervised Domain Adaptation via Domain-Transformer
论文作者
论文摘要
作为模式分析和机器智能中的至关重要问题,无监督的域适应性(UDA)试图将有效的功能学习者从标记的源域转移到未标记的目标域。受到变压器成功的启发,UDA的几个进步是通过采用纯变压器作为网络体系结构来实现的,但是这样一个简单的应用程序只能捕获补丁级信息,并且缺乏可解释性。为了解决这些问题,我们提出了具有域级的注意机制的域变换器(DOT),以捕获跨域样品之间的远程对应关系。从理论方面来说,我们提供了对DOT的数学理解:1)我们将域级的关注与最佳运输理论联系起来,该理论提供了Wasserstein几何形状的解释性; 2)从学习理论的角度来看,得出了基于瓦斯汀的距离泛化界限,这解释了DOT在知识传递方面的有效性。在方法论方面,DOT整合了域级的注意力和多种结构正则化,这表征了跨域群集结构的样本级信息和局部一致性。此外,域级的注意机制可以用作插件模块,因此可以在不同的神经网络体系结构下实现DOT。 DOT没有明确对域级别或类级别的分布差异进行建模,而是在远程对应关系的指导下学习可转移的功能,因此它不含伪标签和显式域差异优化。几个基准数据集的广泛实验结果验证了DOT的有效性。
As a vital problem in pattern analysis and machine intelligence, Unsupervised Domain Adaptation (UDA) attempts to transfer an effective feature learner from a labeled source domain to an unlabeled target domain. Inspired by the success of the Transformer, several advances in UDA are achieved by adopting pure transformers as network architectures, but such a simple application can only capture patch-level information and lacks interpretability. To address these issues, we propose the Domain-Transformer (DoT) with domain-level attention mechanism to capture the long-range correspondence between the cross-domain samples. On the theoretical side, we provide a mathematical understanding of DoT: 1) We connect the domain-level attention with optimal transport theory, which provides interpretability from Wasserstein geometry; 2) From the perspective of learning theory, Wasserstein distance-based generalization bounds are derived, which explains the effectiveness of DoT for knowledge transfer. On the methodological side, DoT integrates the domain-level attention and manifold structure regularization, which characterize the sample-level information and locality consistency for cross-domain cluster structures. Besides, the domain-level attention mechanism can be used as a plug-and-play module, so DoT can be implemented under different neural network architectures. Instead of explicitly modeling the distribution discrepancy at domain-level or class-level, DoT learns transferable features under the guidance of long-range correspondence, so it is free of pseudo-labels and explicit domain discrepancy optimization. Extensive experiment results on several benchmark datasets validate the effectiveness of DoT.