论文标题
使用RL和情节记忆的行为先验进行计划
Planning with RL and episodic-memory behavioral priors
论文作者
论文摘要
学习剂的实际应用需要样本有效且可解释的算法。从行为先验中学习是一种有前途的方法,它是一种比随机探索政策更好或对早期学习陷阱的安全保护措施。现有的模仿学习解决方案需要大量的专家演示,并依靠难以解释的学习方法,例如深Q学习。在这项工作中,我们提出了一种基于计划的方法,该方法可以在强化学习环境中使用这些行为先验进行有效的探索和学习,我们证明以行为先验的形式策划的探索政策可以帮助代理商更快地学习。
The practical application of learning agents requires sample efficient and interpretable algorithms. Learning from behavioral priors is a promising way to bootstrap agents with a better-than-random exploration policy or a safe-guard against the pitfalls of early learning. Existing solutions for imitation learning require a large number of expert demonstrations and rely on hard-to-interpret learning methods like Deep Q-learning. In this work we present a planning-based approach that can use these behavioral priors for effective exploration and learning in a reinforcement learning environment, and we demonstrate that curated exploration policies in the form of behavioral priors can help an agent learn faster.