无牺牲的元提倡学习的解耦探索和剥削

论文标题

无牺牲的元提倡学习的解耦探索和剥削

Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices

论文作者

Liu, Evan Zheran, Raghunathan, Aditi, Liang, Percy, Finn, Chelsea

论文摘要

元强化学习（META-RL）的目标是建立可以通过利用相关任务的先前经验来快速学习新任务的代理。学习新任务通常需要探索以收集与任务相关的信息并利用此信息以解决任务。原则上，可以通过简单地提高任务绩效来端对端学习最佳探索和剥削。但是，由于鸡肉和蛋的问题，这种元RL方法与当地的最佳选择很难：学习探索需要良好的利用来衡量探索的效用，但是学会利用需要通过探索收集信息。优化探索和剥削的单独目标可以避免此问题，但是先前的元RL探索目标产生了次优政策，这些策略会收集与任务无关的信息。我们通过构建一个自动确定与任务相关的信息和仅恢复此信息的探索目标的剥削目标来缓解这两种问题。这避免了端到端训练中的本地优点，而无需牺牲最佳探索。从经验上讲，梦想在复杂的元rl问题（例如稀疏奖励3D视觉导航）上大大优于现有方法。梦的视频：https：//ezliu.github.io/dream/

The goal of meta-reinforcement learning (meta-RL) is to build agents that can quickly learn new tasks by leveraging prior experience on related tasks. Learning a new task often requires both exploring to gather task-relevant information and exploiting this information to solve the task. In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance. However, such meta-RL approaches struggle with local optima due to a chicken-and-egg problem: learning to explore requires good exploitation to gauge the exploration's utility, but learning to exploit requires information gathered via exploration. Optimizing separate objectives for exploration and exploitation can avoid this problem, but prior meta-RL exploration objectives yield suboptimal policies that gather information irrelevant to the task. We alleviate both concerns by constructing an exploitation objective that automatically identifies task-relevant information and an exploration objective to recover only this information. This avoids local optima in end-to-end training, without sacrificing optimal exploration. Empirically, DREAM substantially outperforms existing approaches on complex meta-RL problems, such as sparse-reward 3D visual navigation. Videos of DREAM: https://ezliu.github.io/dream/

下载PDF全文

下载文献需遵守相关版权规定

论文标题