meta-dmoe：通过元依次从杂物中的混合物来适应域的转移

论文标题

meta-dmoe：通过元依次从杂物中的混合物来适应域的转移

Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts

论文作者

Zhong, Tao, Chi, Zhixiang, Gu, Li, Wang, Yang, Yu, Yuanhao, Tang, Jin

论文摘要

在本文中，我们解决了域转移的问题。大多数现有方法使用单个模型在多个源域进行培训，并且在所有看不见的目标域中使用相同的训练模型。这样的解决方案是次优的，因为每个目标域都表现出自己的专业，但并未适应。此外，希望单模培训能够从多个源域学习广泛的知识是违反直觉的。该模型仅倾向于仅学习域不变特征，并可能导致负面知识转移。在这项工作中，我们为无监督测试时间适应的新框架提出了一个新的框架，该框架被称为解决域转移的知识蒸馏过程。具体而言，我们将专家（MOE）的混合物纳入教师，在该教师中，每个专家都经过不同的源域进行培训，以最大程度地提高其专业。给定测试时间目标域，对一组未标记的数据进行采样以查询MOE的知识。由于源域与目标域相关，因此基于变压器的聚合器然后通过检查它们之间的互连结合了域知识。输出被视为将学生预测网络朝目标域调整的监督信号。我们进一步采用元学习来实施聚合者来提炼积极的知识和学生网络以实现快速适应。广泛的实验表明，所提出的方法的表现优于最新方法，并验证了每个提出的组件的有效性。我们的代码可从https://github.com/n3il6666/meta-dmoe获得。

In this paper, we tackle the problem of domain shift. Most existing methods perform training on multiple source domains using a single model, and the same trained model is used on all unseen target domains. Such solutions are sub-optimal as each target domain exhibits its own specialty, which is not adapted. Furthermore, expecting single-model training to learn extensive knowledge from multiple source domains is counterintuitive. The model is more biased toward learning only domain-invariant features and may result in negative knowledge transfer. In this work, we propose a novel framework for unsupervised test-time adaptation, which is formulated as a knowledge distillation process to address domain shift. Specifically, we incorporate Mixture-of-Experts (MoE) as teachers, where each expert is separately trained on different source domains to maximize their specialty. Given a test-time target domain, a small set of unlabeled data is sampled to query the knowledge from MoE. As the source domains are correlated to the target domains, a transformer-based aggregator then combines the domain knowledge by examining the interconnection among them. The output is treated as a supervision signal to adapt a student prediction network toward the target domain. We further employ meta-learning to enforce the aggregator to distill positive knowledge and the student network to achieve fast adaptation. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art and validates the effectiveness of each proposed component. Our code is available at https://github.com/n3il666/Meta-DMoE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题