批处理探索与可扩展机器人增强学习的示例

论文标题

批处理探索与可扩展机器人增强学习的示例

Batch Exploration with Examples for Scalable Robotic Reinforcement Learning

论文作者

Chen, Annie S., Nam, HyunJi, Nair, Suraj, Finn, Chelsea

论文摘要

从不同的离线数据集中学习是学习通用机器人代理的有前途的途径。但是，此范式中的核心挑战在于收集大量有意义的数据，而不是根据循环中的人类进行数据收集。解决这一挑战的一种方法是通过任务不足的探索，在该探索中，代理商试图在没有特定于任务的奖励功能的情况下进行探索，并收集对任何下游任务都有用的数据。尽管这些方法在简单领域表现出了一些希望，但他们经常在更具挑战性的环境中（例如基于视觉的机器人操纵）探索状态空间的相关区域。这一挑战源于一个目标，该目标鼓励探索潜在广阔的国家空间中的一切。为了减轻这一挑战，我们建议使用弱人类监督将探索集中在州空间的重要部分。具体而言，我们提出了一种探索技术，包括示例（BEE）的批处理探索，探索了状态空间的相关区域，并以适量的人类提供的重要状态图像为指导。这些人提供的图像只需要在数据收集开始时一次收集一次，并且可以在几分钟之内收集，从而使我们能够可靠地收集各种数据集，然后可以将其与任何批处理RL算法结合在一起。我们发现，Bee能够在模拟和真正的Franka机器人中解决具有挑战性的基于视觉的操纵任务，并观察到与任务无关紧要的弱点和弱点的探索技术相比，它（1）与相关对象的相互作用经常两倍以上，以及（2）在与Conjuncult conjuncount conjunline conlline conjunline conlline conlline conlline conlline conlline conlline conlline conlline conllline conllline中的相互作用。

Learning from diverse offline datasets is a promising path towards learning general purpose robotic agents. However, a core challenge in this paradigm lies in collecting large amounts of meaningful data, while not depending on a human in the loop for data collection. One way to address this challenge is through task-agnostic exploration, where an agent attempts to explore without a task-specific reward function, and collect data that can be useful for any downstream task. While these approaches have shown some promise in simple domains, they often struggle to explore the relevant regions of the state space in more challenging settings, such as vision based robotic manipulation. This challenge stems from an objective that encourages exploring everything in a potentially vast state space. To mitigate this challenge, we propose to focus exploration on the important parts of the state space using weak human supervision. Concretely, we propose an exploration technique, Batch Exploration with Examples (BEE), that explores relevant regions of the state-space, guided by a modest number of human provided images of important states. These human provided images only need to be collected once at the beginning of data collection and can be collected in a matter of minutes, allowing us to scalably collect diverse datasets, which can then be combined with any batch RL algorithm. We find that BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot, and observe that compared to task-agnostic and weakly-supervised exploration techniques, it (1) interacts more than twice as often with relevant objects, and (2) improves downstream task performance when used in conjunction with offline RL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题