具有指数标准的风险敏感增强学习

论文标题

具有指数标准的风险敏感增强学习

Risk-Sensitive Reinforcement Learning with Exponential Criteria

论文作者

Noorani, Erfaun, Mavridis, Christos, Baras, John

论文摘要

尽管增强学习在许多应用中都表现出了实验成功，但已知它对系统参数中的噪声和扰动敏感，从而在略有不同的环境中导致不同情节之间的总奖励差异很大。为了引入鲁棒性和样本效率，正在彻底研究对风险敏感的增强学习方法。在这项工作中，我们通过解决基于指数标准的修改目标，通过解决针对风险敏感的增强学习问题的定义，并制定对风险敏感的增强学习问题，以近似它们。特别是，我们研究了广泛使用的蒙特卡洛策略梯度算法的无模型风险敏感性变化，并基于使用随机近似更新来求解多重性钟声方程的新型风险敏感的在线参与者 - 批判性算法。分析结果表明，指数标准的使用概括了常用的临时正规化方法，提高样品效率并引入了模型参数和环境中扰动的鲁棒性。在模拟实验中评估了所提出方法的实现，性能和鲁棒性。

While reinforcement learning has shown experimental success in a number of applications, it is known to be sensitive to noise and perturbations in the parameters of the system, leading to high variance in the total reward amongst different episodes in slightly different environments. To introduce robustness, as well as sample efficiency, risk-sensitive reinforcement learning methods are being thoroughly studied. In this work, we provide a definition of robust reinforcement learning policies and formulate a risk-sensitive reinforcement learning problem to approximate them, by solving an optimization problem with respect to a modified objective based on exponential criteria. In particular, we study a model-free risk-sensitive variation of the widely-used Monte Carlo Policy Gradient algorithm and introduce a novel risk-sensitive online Actor-Critic algorithm based on solving a multiplicative Bellman equation using stochastic approximation updates. Analytical results suggest that the use of exponential criteria generalizes commonly used ad-hoc regularization approaches, improves sample efficiency, and introduces robustness with respect to perturbations in the model parameters and the environment. The implementation, performance, and robustness properties of the proposed methods are evaluated in simulated experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题