LQR控制中的最佳动态遗憾

论文标题

LQR控制中的最佳动态遗憾

Optimal Dynamic Regret in LQR Control

论文作者

Baby, Dheeraj, Wang, Yu-Xiang

论文摘要

我们考虑了具有一系列二次损耗的序列，即LQR控制的问题。我们提供了一种有效的在线算法，可以实现$ \ tilde {o}的最佳动态（策略）遗憾（\ text {max} \ {n^{n^{1/3} \ mathcal {tv}（m_ {1：n}） $ \ MATHCAL {TV}（M_ {1：N}）$是由$ M_1，...，M_N $参数列出参数的任何Oracle序列序列的总变化，以事后选择来迎合未知的非机构性。 The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and we prove that it is information-theoretically optimal for LQR.主要技术组件包括将LQR减少到在线线性回归，并延迟由于Foster和Simchowitz（2020）而延迟反馈，以及具有最佳$ \ tilde {O}（o}（n^{1/3}）的最佳$ \ tilde {o}（n^{1/3}）的新的适当学习算法。

We consider the problem of nonstochastic control with a sequence of quadratic losses, i.e., LQR control. We provide an efficient online algorithm that achieves an optimal dynamic (policy) regret of $\tilde{O}(\text{max}\{n^{1/3} \mathcal{TV}(M_{1:n})^{2/3}, 1\})$, where $\mathcal{TV}(M_{1:n})$ is the total variation of any oracle sequence of Disturbance Action policies parameterized by $M_1,...,M_n$ -- chosen in hindsight to cater to unknown nonstationarity. The rate improves the best known rate of $\tilde{O}(\sqrt{n (\mathcal{TV}(M_{1:n})+1)} )$ for general convex losses and we prove that it is information-theoretically optimal for LQR. Main technical components include the reduction of LQR to online linear regression with delayed feedback due to Foster and Simchowitz (2020), as well as a new proper learning algorithm with an optimal $\tilde{O}(n^{1/3})$ dynamic regret on a family of ``minibatched'' quadratic losses, which could be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题