论文标题
在开放世界生存游戏Crafter中学习以中心为中心的代理商概括
Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter
论文作者
论文摘要
强化学习者必须概括他们的培训经验。先前的工作主要集中在相同的培训和评估环境上。从最近推出的Crafter Benchmark(一个2D开放世界生存游戏)开始,我们引入了一套新的环境,适合评估某些代理商对以前看不见的(数量)对象的概括并快速适应(元学习)的能力。在Crafter中,通过培训1M步骤时,通过未锁定成就的数量(例如收集资源)来评估代理商。我们表明,当前的代理商努力概括,并引入新颖的以对象为中心的代理,以改善强大的基准。我们还通过多个实验为将来的手工艺品工作提供了一般兴趣的关键见解。我们表明,仔细的超参数调谐可以通过很大的边距改善PPO基线代理,即使是前馈代理几乎可以通过依靠库存显示来解锁所有成就。我们在原始的手工环境中实现了新的最先进的性能。此外,当经过100万步的培训时,我们的调整代理几乎可以解锁所有成就。我们表明,即使删除了库存信息,反复发作的PPO代理也会改善前馈。我们介绍了Crafterood,这是一组15个新环境,评估OOD的概括。在Crafterood上,我们表明目前的代理无法概括,而我们的新颖中心的代理人实现了最新的OOD概括,同时也可以解释。我们的代码是公开的。
Reinforcement learning agents must generalize beyond their training experience. Prior work has focused mostly on identical training and evaluation environments. Starting from the recently introduced Crafter benchmark, a 2D open world survival game, we introduce a new set of environments suitable for evaluating some agent's ability to generalize on previously unseen (numbers of) objects and to adapt quickly (meta-learning). In Crafter, the agents are evaluated by the number of unlocked achievements (such as collecting resources) when trained for 1M steps. We show that current agents struggle to generalize, and introduce novel object-centric agents that improve over strong baselines. We also provide critical insights of general interest for future work on Crafter through several experiments. We show that careful hyper-parameter tuning improves the PPO baseline agent by a large margin and that even feedforward agents can unlock almost all achievements by relying on the inventory display. We achieve new state-of-the-art performance on the original Crafter environment. Additionally, when trained beyond 1M steps, our tuned agents can unlock almost all achievements. We show that the recurrent PPO agents improve over feedforward ones, even with the inventory information removed. We introduce CrafterOOD, a set of 15 new environments that evaluate OOD generalization. On CrafterOOD, we show that the current agents fail to generalize, whereas our novel object-centric agents achieve state-of-the-art OOD generalization while also being interpretable. Our code is public.