论文标题
Opt-Mimic:模仿动态四倍行为的优化轨迹
OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors
论文作者
论文摘要
增强学习(RL)已经看到了四倍的机器人控制取得了许多成功。参考动议的模仿为指导解决方案的解决方案提供了一个简单而强大的先验,而无需精心奖励设计。尽管大量工作使用运动捕获数据或手工制作的轨迹作为参考运动,但相对较少的工作探索了来自基于模型的轨迹优化的参考运动的使用。在这项工作中,我们研究了通过四种动态行为证明的几种框架出现的设计考虑因素:小跑,前跳,180个背面流动和卷皮阶梯。这些经过模拟训练,并转移到了8个四倍的机器人的情况下,而无需进一步适应。特别是,我们探讨了轨迹优化器提供的前馈设计的空间,以了解其对RL学习效率和SIM模式转移的影响。这些发现有助于生产机器人控制器的长期目标,该机器人控制器将基于模型的优化的可解释性和精度与基于模型的RL基于模型的控制器提供的鲁棒性相结合。
Reinforcement Learning (RL) has seen many recent successes for quadruped robot control. The imitation of reference motions provides a simple and powerful prior for guiding solutions towards desired solutions without the need for meticulous reward design. While much work uses motion capture data or hand-crafted trajectories as the reference motion, relatively little work has explored the use of reference motions coming from model-based trajectory optimization. In this work, we investigate several design considerations that arise with such a framework, as demonstrated through four dynamic behaviours: trot, front hop, 180 backflip, and biped stepping. These are trained in simulation and transferred to a physical Solo 8 quadruped robot without further adaptation. In particular, we explore the space of feed-forward designs afforded by the trajectory optimizer to understand its impact on RL learning efficiency and sim-to-real transfer. These findings contribute to the long standing goal of producing robot controllers that combine the interpretability and precision of model-based optimization with the robustness that model-free RL-based controllers offer.