控制所有这些控制的一项政策：代理 - 不合STONTIC控制的共享模块化策略

论文标题

控制所有这些控制的一项政策：代理 - 不合STONTIC控制的共享模块化策略

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control

论文作者

Huang, Wenlong, Mordatch, Igor, Pathak, Deepak

论文摘要

加强学习通常与针对特定代理的学习控制政策有关。我们研究是否存在一个单一的全球策略，该政策可以概括地控制各种代理形态，即国家和行动空间的维度甚至会发生变化。我们建议将这一全球政策表示为相同的模块化神经网络的集合，称为共享模块化策略（SMP），与每个代理的执行器相对应。每个模块仅负责控制其相应的执行器，并仅从其本地传感器中接收信息。另外，在模块之间传递消息，并在遥远模块之间传播信息。我们表明，单个模块化策略可以成功地为具有不同骨骼结构的几种平面剂（例如单脚架料斗，四倍，双皮子，双胞胎）和对变体中观察到的变体进行概括 - 通常需要每种形态学的训练和手动超参数调谐过程。我们观察到，跨形态的各种各样的运动方式以及集中式的协调通过纯粹来自增强学习目标的分散模块之间传递的消息来出现。 https://huangwl18.github.io/modular-rl/的视频和代码

Reinforcement learning is typically concerned with learning control policies tailored to a particular agent. We investigate whether there exists a single global policy that can generalize to control a wide variety of agent morphologies -- ones in which even dimensionality of state and action spaces changes. We propose to express this global policy as a collection of identical modular neural networks, dubbed as Shared Modular Policies (SMP), that correspond to each of the agent's actuators. Every module is only responsible for controlling its corresponding actuator and receives information from only its local sensors. In addition, messages are passed between modules, propagating information between distant modules. We show that a single modular policy can successfully generate locomotion behaviors for several planar agents with different skeletal structures such as monopod hoppers, quadrupeds, bipeds, and generalize to variants not seen during training -- a process that would normally require training and manual hyperparameter tuning for each morphology. We observe that a wide variety of drastically diverse locomotion styles across morphologies as well as centralized coordination emerges via message passing between decentralized modules purely from the reinforcement learning objective. Videos and code at https://huangwl18.github.io/modular-rl/

下载PDF全文

下载文献需遵守相关版权规定

论文标题