构建一个策略子空间，以进行可扩展的持续学习

论文标题

构建一个策略子空间，以进行可扩展的持续学习

Building a Subspace of Policies for Scalable Continual Learning

论文作者

Gaya, Jean-Baptiste, Doan, Thang, Caccia, Lucas, Soulier, Laure, Denoyer, Ludovic, Raileanu, Roberta

论文摘要

持续获得新知识和技能的能力对于自主代理人至关重要。现有方法通常是基于难以学习大量不同行为的固定尺寸模型，或者是根据任务数量缩小尺寸较差的大小模型。在这项工作中，我们的目标是通过设计一种根据任务序列自适应增长的方法来取得更好的平衡。我们介绍了持续的策略子空间（CSP），这是一种新的方法，它逐步构建了一系列任务培训强化学习代理的策略子空间。子空间的高表达性允许CSP在许多不同的任务中对许多不同的任务进行表现良好，同时随着任务数量而成长。我们的方法不会因忘记而遭受损失，并表现出对新任务的积极转移。 CSP在两个具有挑战性的领域，即Brax（运动）和连续世界（操纵）的各种场景上都超过了许多受欢迎的基线。

The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method that grows adaptively depending on the task sequence. We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. The subspace's high expressivity allows CSP to perform well for many different tasks while growing sublinearly with the number of tasks. Our method does not suffer from forgetting and displays positive transfer to new tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation).

下载PDF全文

下载文献需遵守相关版权规定

论文标题