论文标题

强大的Phi-Divergence MDP

Robust Phi-Divergence MDPs

论文作者

Ho, Chin Pang, Petrik, Marek, Wiesemann, Wolfram

论文摘要

近年来,马尔可夫决策过程(MDP)已成为受不确定性影响的动态决策问题的突出建模框架。与经典的MDP相反,经典MDP仅通过使用已知过渡内核进行随机过程对动态进行建模来解释随机性,强大的MDP通过观察规定的歧义集的最不利过渡内核来优化,以优化歧义。在本文中,我们开发了一个新颖的解决方案框架,该框架具有S矩形歧义集,将问题分解为一系列强大的Bellman更新和单纯形预测。利用与Phi-Divergence歧义集相对应的单纯型预测中存在的丰富结构,我们表明,相关的S型矩形鲁棒MDP可以比与最新的商业求解器以及最新的一阶解决方案相比,可以更快地求解,从而使它们在实用应用中使它们具有有吸引力的经典MDP替代方案。

In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most adverse transition kernel from a prescribed ambiguity set. In this paper, we develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets that decomposes the problem into a sequence of robust Bellman updates and simplex projections. Exploiting the rich structure present in the simplex projections corresponding to phi-divergence ambiguity sets, we show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers as well as a recent first-order solution scheme, thus rendering them attractive alternatives to classical MDPs in practical applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源