论文标题

分配强大的机会约束马尔可夫决策过程

Distributionally robust chance-constrained Markov decision processes

论文作者

Nguyen, Hoang Nam, Lisser, Abdel, Singh, Vikas Vikram

论文摘要

马尔可夫决策过程(MDP)是一个决策框架,决策者有兴趣最大程度地提高未来各州获得的奖励流的预期折扣价值,这些奖励是根据受控的马尔可夫连锁店访问的。文献中有许多包括线性编程方法在内的算法可以在奖励和过渡概率确定性时计算最佳策略。在本文中,我们考虑了一个MDP问题,其中已知过渡概率,奖励向量是一个随机向量,其分布是部分已知的。我们在各种基于矩的不确定性集中使用分布强大的机会约束优化框架来制定MDP问题,并使用Phi-Divergence和Wasserstein距离度量定义了基于统计距离的不确定性集。对于每种类型的不确定性集,我们考虑随机奖励向量具有全部支持或非负支持的情况。对于全面支持的情况,我们表明,分配强大的机会约束的马尔可夫决策过程等于矩和基于Phi-Divergence距离的不确定性集的二阶锥形编程问题,并且相当于基于WaseSertErstein距离的混合型二阶编程问题,基于WaseSertErstein距离设置。对于非负支持的情况,它等效于共同优化问题和基于矩的不确定性集和基于Wasserstein距离的不确定性集的共同优化问题。作为应用程序,我们研究了机器替换问题,并说明了随机生成的实例的数值实验。

Markov decision process (MDP) is a decision making framework where a decision maker is interested in maximizing the expected discounted value of a stream of rewards received at future stages at various states which are visited according to a controlled Markov chain. Many algorithms including linear programming methods are available in the literature to compute an optimal policy when the rewards and transition probabilities are deterministic. In this paper, we consider an MDP problem where the transition probabilities are known and the reward vector is a random vector whose distribution is partially known. We formulate the MDP problem using distributionally robust chance-constrained optimization framework under various types of moments based uncertainty sets, and statistical-distance based uncertainty sets defined using phi-divergence and Wasserstein distance metric. For each type of uncertainty set, we consider the case where a random reward vector has either a full support or a nonnegative support. For the case of full support, we show that the distributionally robust chance-constrained Markov decision process is equivalent to a second-order cone programming problem for the moments and phi-divergence distance based uncertainty sets, and it is equivalent to a mixed-integer second-order cone programming problem for an Wasserstein distance based uncertainty set. For the case of nonnegative support, it is equivalent to a copositive optimization problem and a biconvex optimization problem for the moments based uncertainty sets and Wasserstein distance based uncertainty set, respectively. As an application, we study a machine replacement problem and illustrate numerical experiments on randomly generated instances.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源