响应接近物体的运动政策的层次结构增强学习：初步研究

论文标题

响应接近物体的运动政策的层次结构增强学习：初步研究

Hierarchical Reinforcement Learning of Locomotion Policies in Response to Approaching Objects: A Preliminary Study

论文作者

Yu, Shangqun, Rammohan, Sreehari, Zheng, Kaiyu, Konidaris, George

论文摘要

兔子和鸟类等动物可以立即对动态的，接近的物体（例如人或岩石）产生运动行为，尽管可能以前从未见过对象并且对物体特性的看法有限。最近，深厚的增强学习使像类人形机器人等复杂的运动学系统能够从点A点成功地转移到点B。受到自然界中动物的先天反应性行为的启发，我们希望将机器人运动中的这种进步扩展到设置中，这些设置涉及外部，动态物体所涉及其特性的部分可观察到机器人的特性。作为朝着这一目标的第一步，我们在穆约科（Mujoco）建立了一个模拟环境，在该环境中，腿部机器人必须避免被球朝其击中。我们探讨了先前的运动经历是否通常具有在拟议的层次强化学习框架下具有反应性控制政策的学习。初步结果支持这样的主张，即即使考虑到部分可观察性（基于半径的对象可见性），学习学习方法变得更加有效。

Animals such as rabbits and birds can instantly generate locomotion behavior in reaction to a dynamic, approaching object, such as a person or a rock, despite having possibly never seen the object before and having limited perception of the object's properties. Recently, deep reinforcement learning has enabled complex kinematic systems such as humanoid robots to successfully move from point A to point B. Inspired by the observation of the innate reactive behavior of animals in nature, we hope to extend this progress in robot locomotion to settings where external, dynamic objects are involved whose properties are partially observable to the robot. As a first step toward this goal, we build a simulation environment in MuJoCo where a legged robot must avoid getting hit by a ball moving toward it. We explore whether prior locomotion experiences that animals typically possess benefit the learning of a reactive control policy under a proposed hierarchical reinforcement learning framework. Preliminary results support the claim that the learning becomes more efficient using this hierarchical reinforcement learning method, even when partial observability (radius-based object visibility) is taken into account.

下载PDF全文

下载文献需遵守相关版权规定

论文标题