论文标题
Causalworld:用于因果结构和转移学习的机器人操纵基准
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
论文作者
论文摘要
尽管加强学习(RL)最近取得了成功,但对于代理商来说,将学习的技能转移到相关环境仍然是一个挑战。为了促进解决此问题的研究,我们提出了Causalworld,这是在机器人操纵环境中为因果结构和转移学习的基准。环境是对开源机器人平台的模拟,因此提供了SIM到现实传输的可能性。任务包括从给定的一组块中构造3D形状 - 灵感来自孩子如何学习建立复杂结构的方式。因果关系的关键优势在于,它提供了具有共同因果结构和潜在因素(例如,机器人和物体质量,颜色,尺寸)的此类任务的组合家族。用户(或代理商)可以干预所有因果变量,这允许对不同的任务(或任务分布)的相似方式进行精细控制。因此,人们可以轻松地定义所需难度水平的训练和评估分布,以针对特定形式的泛化形式(例如,外观或物体质量的变化)。此外,这种常见的参数化通过在初始任务和目标任务之间插值来促进定义课程。尽管用户可以定义自己的任务分布,但我们将八个有意义的分布作为具体的基准,从简单到非常具有挑战性,所有这些都需要长马计划以及精确的低级电动机控制。最后,我们在不同的培训课程和相应的评估协议上为这些任务的一部分提供了基线结果,从而验证了该基准中任务的可行性。
Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.