通过域转换器适应无监督的域名

论文标题

通过域转换器适应无监督的域名

Towards Unsupervised Domain Adaptation via Domain-Transformer

论文作者

Chuan-Xian, Ren, Yi-Ming, Zhai, You-Wei, Luo, Hong, Yan

论文摘要

作为模式分析和机器智能中的至关重要问题，无监督的域适应性（UDA）试图将有效的功能学习者从标记的源域转移到未标记的目标域。受到变压器成功的启发，UDA的几个进步是通过采用纯变压器作为网络体系结构来实现的，但是这样一个简单的应用程序只能捕获补丁级信息，并且缺乏可解释性。为了解决这些问题，我们提出了具有域级的注意机制的域变换器（DOT），以捕获跨域样品之间的远程对应关系。从理论方面来说，我们提供了对DOT的数学理解：1）我们将域级的关注与最佳运输理论联系起来，该理论提供了Wasserstein几何形状的解释性； 2）从学习理论的角度来看，得出了基于瓦斯汀的距离泛化界限，这解释了DOT在知识传递方面的有效性。在方法论方面，DOT整合了域级的注意力和多种结构正则化，这表征了跨域群集结构的样本级信息和局部一致性。此外，域级的注意机制可以用作插件模块，因此可以在不同的神经网络体系结构下实现DOT。 DOT没有明确对域级别或类级别的分布差异进行建模，而是在远程对应关系的指导下学习可转移的功能，因此它不含伪标签和显式域差异优化。几个基准数据集的广泛实验结果验证了DOT的有效性。

As a vital problem in pattern analysis and machine intelligence, Unsupervised Domain Adaptation (UDA) attempts to transfer an effective feature learner from a labeled source domain to an unlabeled target domain. Inspired by the success of the Transformer, several advances in UDA are achieved by adopting pure transformers as network architectures, but such a simple application can only capture patch-level information and lacks interpretability. To address these issues, we propose the Domain-Transformer (DoT) with domain-level attention mechanism to capture the long-range correspondence between the cross-domain samples. On the theoretical side, we provide a mathematical understanding of DoT: 1) We connect the domain-level attention with optimal transport theory, which provides interpretability from Wasserstein geometry; 2) From the perspective of learning theory, Wasserstein distance-based generalization bounds are derived, which explains the effectiveness of DoT for knowledge transfer. On the methodological side, DoT integrates the domain-level attention and manifold structure regularization, which characterize the sample-level information and locality consistency for cross-domain cluster structures. Besides, the domain-level attention mechanism can be used as a plug-and-play module, so DoT can be implemented under different neural network architectures. Instead of explicitly modeling the distribution discrepancy at domain-level or class-level, DoT learns transferable features under the guidance of long-range correspondence, so it is free of pseudo-labels and explicit domain discrepancy optimization. Extensive experiment results on several benchmark datasets validate the effectiveness of DoT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题