确定性的探索和剥削的探索测序

论文标题

确定性的探索和剥削的探索测序

Deterministic Sequencing of Exploration and Exploitation for Reinforcement Learning

论文作者

Gupta, Piyush, Srivastava, Vaibhav

论文摘要

我们提出了对基于模型的RL问题的交织勘探和开发时期的探索和剥削（DSEE）算法的确定性测序，旨在同时学习系统模型，即马尔可夫决策过程（MDP）以及相关的最佳政策。在探索过程中，DSEE探索环境并更新预期奖励和过渡概率的估计值。在开发过程中，对预期奖励和过渡概率的最新估计值用于获得具有很高可能性的强大政策。我们设计了探索和剥削时期的长度，以使累积遗憾成为时间的亚线性功能。

We propose Deterministic Sequencing of Exploration and Exploitation (DSEE) algorithm with interleaving exploration and exploitation epochs for model-based RL problems that aim to simultaneously learn the system model, i.e., a Markov decision process (MDP), and the associated optimal policy. During exploration, DSEE explores the environment and updates the estimates for expected reward and transition probabilities. During exploitation, the latest estimates of the expected reward and transition probabilities are used to obtain a robust policy with high probability. We design the lengths of the exploration and exploitation epochs such that the cumulative regret grows as a sub-linear function of time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题