论文标题
深度Q学习:一种强大的控制方法
Deep Q-learning: a robust control approach
论文作者
论文摘要
在本文中,我们将深度Q学习置于面向控制的视角中,并通过可靠的控制良好的技术研究其学习动力学。我们通过神经切线内核制定不确定的线性时间不变模型来描述学习。我们显示了学习和分析频域行为的不稳定。然后,我们通过稳健控制器确保融合在损失函数中充当动态奖励。我们合成了三个控制器:状态反馈增益计划H2,动态HINF和恒定增益HINF控制器。与强化学习中的启发式方法相比,使用以控制为导向的调整方法来建立学习代理更透明,并且具有完善的文献。此外,我们的方法不使用目标网络和随机重播内存。控制网络的作用被控制输入所取代,该输入还利用了样本的时间依赖性(与随机内存缓冲区相反)。在不同的OpenAI健身环境中的数值模拟表明,HINF受控学习的性能比双重Q学习略好。
In this paper, we place deep Q-learning into a control-oriented perspective and study its learning dynamics with well-established techniques from robust control. We formulate an uncertain linear time-invariant model by means of the neural tangent kernel to describe learning. We show the instability of learning and analyze the agent's behavior in frequency-domain. Then, we ensure convergence via robust controllers acting as dynamical rewards in the loss function. We synthesize three controllers: state-feedback gain scheduling H2, dynamic Hinf, and constant gain Hinf controllers. Setting up the learning agent with a control-oriented tuning methodology is more transparent and has well-established literature compared to the heuristics in reinforcement learning. In addition, our approach does not use a target network and randomized replay memory. The role of the target network is overtaken by the control input, which also exploits the temporal dependency of samples (opposed to a randomized memory buffer). Numerical simulations in different OpenAI Gym environments suggest that the Hinf controlled learning performs slightly better than Double deep Q-learning.