学习分散的多臂运动计划者

论文标题

学习分散的多臂运动计划者

Learning a Decentralized Multi-arm Motion Planner

论文作者

Ha, Huy, Xu, Jingxi, Song, Shuran

论文摘要

我们提出了一个闭环多臂运动计划器，可扩展且具有团队尺寸的灵活性。传统的多臂机器人系统依赖于集中的运动计划者，他们的运行时间通常会以团队规模成倍扩展，因此无法处理具有开环控制的动态环境。在本文中，我们通过多机构增强学习解决了这个问题，在该学习中，训练了分散的政策，以控制多臂系统中的一个机器人臂，以达到其目标最终效果姿势，鉴于观察到其工作空间状态和目标最终效果姿势。该政策是使用软演员培训的，并通过基于抽样的运动计划算法（即BIRRT）的专家演示进行了培训。通过利用经典计划算法，我们可以提高增强学习算法的学习效率，同时保留神经网络的快速推理时间。由此产生的策略量表是下线的，可以部署在具有可变团队大小的多臂系统上。得益于闭环和分散的配方，我们的方法概括了5-10个多臂系统和动态移动目标（10臂系统的成功率> 90％），尽管仅接受了具有静态目标的1-4个ARM计划任务的培训。代码和数据链接可以在https://multiarm.cs.columbia.edu上找到。

We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robot systems have relied on centralized motion planners, whose runtimes often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a decentralized policy is trained to control one robot arm in the multi-arm system to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multi-arm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets. Code and data links can be found at https://multiarm.cs.columbia.edu.

下载PDF全文

下载文献需遵守相关版权规定

论文标题