论文标题
在搜索强化学习中的反馈
On the Search for Feedback in Reinforcement Learning
论文作者
论文摘要
在未知的非线性动力学系统中加强学习(RL)的问题等同于搜索使用动态系统的仿真/推出的最佳反馈定律。大多数RL技术在复杂的全球非线性反馈参数中搜索,使它们遇到了高训练时间和差异。取而代之的是,我们主张搜索由开环序列组成的本地反馈表示,以及由开环完全确定的相关最佳线性反馈定律。我们表明,这种替代方法会导致高效的训练,获得的答案是可重复的,因此可靠,并且由此产生的封闭性能优于全球最先进的RL技术。最后,如果我们在需要时进行重新拨动,这是由于快速可靠的本地解决方案而可行的,则它使我们能够恢复由此产生的反馈法的全球最佳性。
The problem of Reinforcement Learning (RL) in an unknown nonlinear dynamical system is equivalent to the search for an optimal feedback law utilizing the simulations/ rollouts of the dynamical system. Most RL techniques search over a complex global nonlinear feedback parametrization making them suffer from high training times as well as variance. Instead, we advocate searching over a local feedback representation consisting of an open-loop sequence, and an associated optimal linear feedback law completely determined by the open-loop. We show that this alternate approach results in highly efficient training, the answers obtained are repeatable and hence reliable, and the resulting closed performance is superior to global state-of-the-art RL techniques. Finally, if we replan, whenever required, which is feasible due to the fast and reliable local solution, it allows us to recover global optimality of the resulting feedback law.