机器人的手臂控制和通过深度加强学习的培训

论文标题

机器人的手臂控制和通过深度加强学习的培训

Robotic Arm Control and Task Training through Deep Reinforcement Learning

论文作者

Franceschetti, Andrea, Tosello, Elisa, Castaman, Nicola, Ghidoni, Stefano

论文摘要

本文提出了对信任区域政策优化和深入网络的详细和广泛比较，以及与其他最先进算法相对于其他最新算法的归一化优势函数，即深层确定性的政策梯度和香草政策梯度。比较表明，前者在要求机器人臂完成操作任务（例如达到随机目标姿势并选择对象）时具有更好的性能。提供了模拟和现实世界实验。仿真使我们可以显示我们采用的程序来精确估计算法超参数并正确设计良好的策略。现实世界实验让我们表明，如果对模拟进行了正确训练，则可以在实际环境中转移和执行，而几乎没有更改。

This paper proposes a detailed and extensive comparison of the Trust Region Policy Optimization and DeepQ-Network with Normalized Advantage Functions with respect to other state of the art algorithms, namely Deep Deterministic Policy Gradient and Vanilla Policy Gradient. Comparisons demonstrate that the former have better performances then the latter when asking robotic arms to accomplish manipulation tasks such as reaching a random target pose and pick &placing an object. Both simulated and real-world experiments are provided. Simulation lets us show the procedures that we adopted to precisely estimate the algorithms hyper-parameters and to correctly design good policies. Real-world experiments let show that our polices, if correctly trained on simulation, can be transferred and executed in a real environment with almost no changes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题