在线与离线自适应域随机基准

论文标题

在线与离线自适应域随机基准

Online vs. Offline Adaptive Domain Randomization Benchmark

论文作者

Tiboni, Gabriele, Arndt, Karol, Averta, Giuseppe, Kyrki, Ville, Tommasi, Tatiana

论文摘要

物理模拟器在安全，不受约束的环境中方便学习加强学习政策方面表现出了巨大的希望。但是，由于现实差距，将获得的知识转移到现实世界可能会具有挑战性。为此，最近已经提出了几种方法来自动调整具有后验分布的真实数据，以在训练时与域随机化一起使用。这些方法已被证明在不同的设置和假设下适用于各种机器人任务。然而，现有文献缺乏对转移性能和实际数据效率的现有自适应域随机方法的详尽比较。在这项工作中，我们为离线方法和在线方法（Simopt，Bayrn，Droid，Dropo）提供了一个开放的基准测试，最适合于手头的每个设置和任务。我们发现，在线方法受到下一次迭代的当前学会策略的质量的限制，而在使用开环命令中重新仿真时，离线方法有时可能会失败。所使用的代码将在https://github.com/gabrieletiboni/adr-benchmark上发布。

Physics simulators have shown great promise for conveniently learning reinforcement learning policies in safe, unconstrained environments. However, transferring the acquired knowledge to the real world can be challenging due to the reality gap. To this end, several methods have been recently proposed to automatically tune simulator parameters with posterior distributions given real data, for use with domain randomization at training time. These approaches have been shown to work for various robotic tasks under different settings and assumptions. Nevertheless, existing literature lacks a thorough comparison of existing adaptive domain randomization methods with respect to transfer performance and real-data efficiency. In this work, we present an open benchmark for both offline and online methods (SimOpt, BayRn, DROID, DROPO), to shed light on which are most suitable for each setting and task at hand. We found that online methods are limited by the quality of the currently learned policy for the next iteration, while offline methods may sometimes fail when replaying trajectories in simulation with open-loop commands. The code used will be released at https://github.com/gabrieletiboni/adr-benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题