论文标题
延迟的几何折扣:增强学习的替代标准
Delayed Geometric Discounts: An Alternative Criterion for Reinforcement Learning
论文作者
论文摘要
人工智能(AI)的努力是设计能够完成复杂任务的自主代理。也就是说,加强学习(RL)提出了学习最佳行为的理论背景。实际上,RL算法依靠几何折扣来评估此最佳性。不幸的是,这并不涵盖未来回报并不降低价值的决策过程。根据问题的不同,此限制会引起样本信息(由于饲料后额定值呈指数衰减),并且需要其他课程/探索机制(以处理稀疏,欺骗性或对抗性奖励)。在本文中,我们通过通过延迟目标功能将折现问题提出来解决这些问题。我们研究了得出的基本RL问题:1)最佳的固定解决方案和2)最佳非平稳控制的近似值。设计的算法解决了对表格环境的硬探索问题,并在经典的模拟机器人基准上提高了样品效率。
The endeavor of artificial intelligence (AI) is to design autonomous agents capable of achieving complex tasks. Namely, reinforcement learning (RL) proposes a theoretical background to learn optimal behaviors. In practice, RL algorithms rely on geometric discounts to evaluate this optimality. Unfortunately, this does not cover decision processes where future returns are not exponentially less valuable. Depending on the problem, this limitation induces sample-inefficiency (as feed-backs are exponentially decayed) and requires additional curricula/exploration mechanisms (to deal with sparse, deceptive or adversarial rewards). In this paper, we tackle these issues by generalizing the discounted problem formulation with a family of delayed objective functions. We investigate the underlying RL problem to derive: 1) the optimal stationary solution and 2) an approximation of the optimal non-stationary control. The devised algorithms solved hard exploration problems on tabular environment and improved sample-efficiency on classic simulated robotics benchmarks.