论文标题
马尔可夫决策过程轨迹的共形预测间隔
Conformal Prediction Intervals for Markov Decision Process Trajectories
论文作者
论文摘要
在将任务委派给自主系统之前,人类操作员可能需要保证对系统的行为。本文扩展了对功能数据的共形预测的先前工作,并扩展了整数分数回归,以提供对马尔可夫决策过程(MDP)执行固定控制策略的自主系统的未来行为的共形预测间隔。预测间隔是通过将共形校正应用于分位数回归计算的预测间隔来构建的。最终的间隔保证,使用概率$ 1-δ$,观察到的轨迹将位于预测间隔内,其中计算概率相对于开始状态分布和MDP的随机性。该方法在MDP上进行了用于入侵物种管理和Starcraft2战斗的方法。
Before delegating a task to an autonomous system, a human operator may want a guarantee about the behavior of the system. This paper extends previous work on conformal prediction for functional data and conformalized quantile regression to provide conformal prediction intervals over the future behavior of an autonomous system executing a fixed control policy on a Markov Decision Process (MDP). The prediction intervals are constructed by applying conformal corrections to prediction intervals computed by quantile regression. The resulting intervals guarantee that with probability $1-δ$ the observed trajectory will lie inside the prediction interval, where the probability is computed with respect to the starting state distribution and the stochasticity of the MDP. The method is illustrated on MDPs for invasive species management and StarCraft2 battles.