通过正规化虚假相关性来改善多任务概括

论文标题

通过正规化虚假相关性来改善多任务概括

Improving Multi-Task Generalization via Regularizing Spurious Correlation

论文作者

Hu, Ziniu, Zhao, Zhe, Yi, Xinyang, Yao, Tiansheng, Hong, Lichan, Sun, Yizhou, Chi, Ed H.

论文摘要

多任务学习（MTL）是通过知识共享提高概括性能的强大学习范式。但是，现有研究发现MTL有时可能会损害概括，尤其是当两个任务较少相关时。伤害概括的可能原因是虚假相关性，即，某些知识是虚假的，并且与任务标签无关，但是模型可能会错误地利用它们，因此当这种相关性改变时失败。在MTL设置中，存在虚假相关性的几个独特挑战。首先，拥有非因果知识的风险更高，因为共享的MTL模型需要从不同任务中编码所有知识，而一项任务的因果知识可能会使另一个任务伪造。其次，任务标签之间的混杂因素带来了与MTL的不同类型的虚假相关性。从理论上讲，我们证明，MTL更容易从其他任务中汲取非因果知识，而不是单任务学习，从而更糟。为了解决这个问题，我们提出了多任务因果表示框架学习框架，旨在通过解开的神经模块来表示多任务知识，并通过MTL特异性不变正则化来了解哪个模块与每个任务有因果关系。实验表明，通过减轻虚假的相关性问题，与多机构，Movielens，Taskomony，CityScape和Nyuv2相比，它可以平均将MTL模型的性能提高5.5％。

Multi-Task Learning (MTL) is a powerful learning paradigm to improve generalization performance via knowledge sharing. However, existing studies find that MTL could sometimes hurt generalization, especially when two tasks are less correlated. One possible reason that hurts generalization is spurious correlation, i.e., some knowledge is spurious and not causally related to task labels, but the model could mistakenly utilize them and thus fail when such correlation changes. In MTL setup, there exist several unique challenges of spurious correlation. First, the risk of having non-causal knowledge is higher, as the shared MTL model needs to encode all knowledge from different tasks, and causal knowledge for one task could be potentially spurious to the other. Second, the confounder between task labels brings in a different type of spurious correlation to MTL. We theoretically prove that MTL is more prone to taking non-causal knowledge from other tasks than single-task learning, and thus generalize worse. To solve this problem, we propose Multi-Task Causal Representation Learning framework, aiming to represent multi-task knowledge via disentangled neural modules, and learn which module is causally related to each task via MTL-specific invariant regularization. Experiments show that it could enhance MTL model's performance by 5.5% on average over Multi-MNIST, MovieLens, Taskonomy, CityScape, and NYUv2, via alleviating spurious correlation problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题