部分可观测时空混沌系统的无模型预测

论文标题

部分可观测时空混沌系统的无模型预测

Continual Learning with Foundation Models: An Empirical Study of Latent Replay

论文作者

Ostapenko, Oleksiy, Lesort, Timothee, Rodríguez, Pau, Arefin, Md Rifat, Douillard, Arthur, Rish, Irina, Charlin, Laurent

论文摘要

大规模预训练的快速开发导致基础模型可以充当各种下游任务和领域的有效提取器。在此激励的基础上，我们研究了预训练的视觉模型的功效，这是下游持续学习（CL）场景的基础。我们的目标是双重的。首先，我们想了解RAW-DATA空间和预训练编码器的潜在空间中CL之间的计算准确性权衡。其次，我们研究了编码器的特征，训练算法和数据以及所得的潜在空间如何影响Cl性能。为此，我们将各种预训练模型在大规模的基准测试方案中的功效与在潜在和RAW-DATA空间中应用的香草重播设置的功效。值得注意的是，这项研究表明了转移，遗忘，任务相似性和学习如何取决于输入数据特征，而不一定取决于CL算法。首先，我们表明，在某些情况下，通过可忽略的计算中的非参数分类器可以很容易地实现合理的CL性能。然后，我们展示如何在更广泛的数据上进行预训练的模型如何为各种重播大小提供更好的性能。我们用这些表示形式的代表性相似性和传递属性来解释这一点。最后，与训练域相比，我们显示了自我监督预训练对下游域的有效性。我们指出并验证了几个研究方向，这些方向可以进一步提高潜在CL的功效，包括表示结合。本研究中使用的各种数据集可以用作进一步研究的计算效率游乐场。该代码库可在https://github.com/oleksost/latent_cl下获得。

Rapid development of large-scale pre-training has resulted in foundation models that can act as effective feature extractors on a variety of downstream tasks and domains. Motivated by this, we study the efficacy of pre-trained vision models as a foundation for downstream continual learning (CL) scenarios. Our goal is twofold. First, we want to understand the compute-accuracy trade-off between CL in the raw-data space and in the latent space of pre-trained encoders. Second, we investigate how the characteristics of the encoder, the pre-training algorithm and data, as well as of the resulting latent space affect CL performance. For this, we compare the efficacy of various pre-trained models in large-scale benchmarking scenarios with a vanilla replay setting applied in the latent and in the raw-data space. Notably, this study shows how transfer, forgetting, task similarity and learning are dependent on the input data characteristics and not necessarily on the CL algorithms. First, we show that under some circumstances reasonable CL performance can readily be achieved with a non-parametric classifier at negligible compute. We then show how models pre-trained on broader data result in better performance for various replay sizes. We explain this with representational similarity and transfer properties of these representations. Finally, we show the effectiveness of self-supervised pre-training for downstream domains that are out-of-distribution as compared to the pre-training domain. We point out and validate several research directions that can further increase the efficacy of latent CL including representation ensembling. The diverse set of datasets used in this study can serve as a compute-efficient playground for further CL research. The codebase is available under https://github.com/oleksost/latent_CL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题