通过拉格朗日放松在线性季节调节器中有效的乐观探索

论文标题

通过拉格朗日放松在线性季节调节器中有效的乐观探索

Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

论文作者

Abeille, Marc, Lazaric, Alessandro

论文摘要

我们研究线性二次调节器（LQR）设置中的勘探 - 开发困境。受到有限MDP的乐观算法中使用的扩展价值迭代算法的启发，我们建议放宽\ ofulq的乐观优化，并将其施加到约束的\ textIt {Extended} LQR问题中，其中附加控制变量隐含在置信区间内隐含地选择系统动力学。然后，我们转到相应的Lagrangian配方，我们证明了强烈的二元性。结果，我们表明，可以通过求解$ o \ big（\ log（1/ε）\ big）$ riccati方程来有效地计算出$ε$ optimistic控制器。最后，我们证明放松原始\ ofu问题不会影响学习绩效，从而恢复了$ \ tilde {o}（\ sqrt {t}）$ searter \ ofulq的遗憾。据我们所知，这是第一个基于计算的基于置信度的LQR算法，具有最差的最佳遗憾保证。

We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of \ofulq and cast it into a constrained \textit{extended} LQR problem, where an additional control variable implicitly selects the system dynamics within a confidence interval. We then move to the corresponding Lagrangian formulation for which we prove strong duality. As a result, we show that an $ε$-optimistic controller can be computed efficiently by solving at most $O\big(\log(1/ε)\big)$ Riccati equations. Finally, we prove that relaxing the original \ofu problem does not impact the learning performance, thus recovering the $\tilde{O}(\sqrt{T})$ regret of \ofulq. To the best of our knowledge, this is the first computationally efficient confidence-based algorithm for LQR with worst-case optimal regret guarantees.

下载PDF全文

下载文献需遵守相关版权规定

论文标题