元学习中的火车验证分裂有多重要？

论文标题

元学习中的火车验证分裂有多重要？

How Important is the Train-Validation Split in Meta-Learning?

论文作者

Bai, Yu, Chen, Minshuo, Zhou, Pan, Zhao, Tuo, Lee, Jason D., Kakade, Sham, Wang, Huan, Xiong, Caiming

论文摘要

元学习旨在通过从多个现有任务中学习“先验”来对新任务进行快速适应。元学习中的一种常见做法是执行火车验证拆分（\ emph {train-val方法}），其中先前的对任务对一个数据的一个拆分进行了调整，并且在另一种拆分上评估了所得的预测变量。尽管存在盛行，但在理论上或实践中，火车验证拆分的重要性都不是很好的理解，尤其是与更直接的\ emph {train-train方法}相比，它使用所有执行每个按务数据进行培训和评估。我们提供了一项详细的理论研究，介绍了是否以及何时何时进行列车验证分裂有助于线性质心荟萃网学习问题。在不可知论的情况下，我们表明，在元测试的最佳先验之前，将火车数量方法的预期损失最小化，而对于通常没有数据的结构性假设，火车训练方法并非如此。相反，在可实现的情况下，在线性模型中生成数据的情况下，我们表明，在预期的最佳先验中，火车瓦尔和火车训练损失都可以最小化。此外，也许令人惊讶的是，我们的主要结果表明，即使在这种可实现的情况下，火车训练方法在这种可实现的情况下达到了\ emph {严格的更好}的多余损失，即使正规化参数和拆分比对于这两种方法都最佳地调整了。我们的结果表明，样本分裂可能并不总是可取的，尤其是当模型可实现数据时。我们通过实验表明，在模拟和实际元学习任务上，火车训练方法确实可以超越火车瓦尔方法，从而验证了我们的理论。

Meta-learning aims to perform fast adaptation on a new task through learning a "prior" from multiple existing tasks. A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split. Despite its prevalence, the importance of the train-validation split is not well understood either in theory or in practice, particularly in comparison to the more direct \emph{train-train method}, which uses all the per-task data for both training and evaluation. We provide a detailed theoretical study on whether and when the train-validation split is helpful in the linear centroid meta-learning problem. In the agnostic case, we show that the expected loss of the train-val method is minimized at the optimal prior for meta testing, and this is not the case for the train-train method in general without structural assumptions on the data. In contrast, in the realizable case where the data are generated from linear models, we show that both the train-val and train-train losses are minimized at the optimal prior in expectation. Further, perhaps surprisingly, our main result shows that the train-train method achieves a \emph{strictly better} excess loss in this realizable case, even when the regularization parameter and split ratio are optimally tuned for both methods. Our results highlight that sample splitting may not always be preferable, especially when the data is realizable by the model. We validate our theories by experimentally showing that the train-train method can indeed outperform the train-val method, on both simulations and real meta-learning tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题