线性控制器在LQR控制中的功能

论文标题

线性控制器在LQR控制中的功能

The Power of Linear Controllers in LQR Control

论文作者

Goel, Gautam, Hassibi, Babak

论文摘要

线性二次调节器（LQR）框架考虑了调节由环境噪声扰动的线性动力系统的问题。我们计算三个不同的控制政策之间的政策后悔：i）最佳的在线政策，其线性结构由Ricatti方程式给出； ii）最佳的离线线性策略，这是鉴于噪声序列的最佳线性状态反馈策略； iii）最佳的离线策略，它选择给定噪声序列的全球最佳控制动作。我们充分表征了最佳的离线政策，并表明它在最佳的在线政策和未来的干扰方面具有递归形式。我们还表明，随着时间范围的增长，最佳离线线性策略的成本会融合到最佳在线政策的成本，因此，即使在自动离线策略中，最佳的离线线性策略也会引起线性遗憾，即使在乐观的环境中，从已知的分布中抽出噪声。尽管我们专注于噪声随机的环境，但我们的结果也意味着当自适应对手选择噪声时，可以实现政策后悔的新范围。

The Linear Quadratic Regulator (LQR) framework considers the problem of regulating a linear dynamical system perturbed by environmental noise. We compute the policy regret between three distinct control policies: i) the optimal online policy, whose linear structure is given by the Ricatti equations; ii) the optimal offline linear policy, which is the best linear state feedback policy given the noise sequence; and iii) the optimal offline policy, which selects the globally optimal control actions given the noise sequence. We fully characterize the optimal offline policy and show that it has a recursive form in terms of the optimal online policy and future disturbances. We also show that cost of the optimal offline linear policy converges to the cost of the optimal online policy as the time horizon grows large, and consequently the optimal offline linear policy incurs linear regret relative to the optimal offline policy, even in the optimistic setting where the noise is drawn i.i.d from a known distribution. Although we focus on the setting where the noise is stochastic, our results also imply new lower bounds on the policy regret achievable when the noise is chosen by an adaptive adversary.

下载PDF全文

下载文献需遵守相关版权规定

论文标题