蒙特卡洛树搜索以MDP的象征意见为指导

论文标题

蒙特卡洛树搜索以MDP的象征意见为指导

Monte Carlo Tree Search guided by Symbolic Advice for MDPs

论文作者

Busatto-Gaston, Damien, Chakraborty, Debraj, Raskin, Jean-Francois

论文摘要

在本文中，我们考虑了一种旨在优化马尔可夫决策过程中预期平均奖励的策略的在线计算。该策略是通过退缩的地平线和使用蒙特卡洛树搜索（MCT）计算的。我们使用符号建议的概念来增强MCTS算法，并证明其经典的理论保证已得到保证。符号建议用于偏向MCT的选择和模拟策略。我们描述了如何使用QBF和SAT求解器以有效的方式实施符号建议。我们使用流行的PAC-Man说明了我们的新算法，并表明我们的算法的性能超过了普通MCT以及人类玩家的表演。

In this paper, we consider the online computation of a strategy that aims at optimizing the expected average reward in a Markov decision process. The strategy is computed with a receding horizon and using Monte Carlo tree search (MCTS). We augment the MCTS algorithm with the notion of symbolic advice, and show that its classical theoretical guarantees are maintained. Symbolic advice are used to bias the selection and simulation strategies of MCTS. We describe how to use QBF and SAT solvers to implement symbolic advice in an efficient way. We illustrate our new algorithm using the popular game Pac-Man and show that the performances of our algorithm exceed those of plain MCTS as well as the performances of human players.

下载PDF全文

下载文献需遵守相关版权规定

论文标题