学习隐藏参数块MDP的强大状态抽象

论文标题

学习隐藏参数块MDP的强大状态抽象

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

论文作者

Zhang, Amy, Sodhani, Shagun, Khetarpal, Khimya, Pineau, Joelle

论文摘要

许多控制任务都表现出类似的动态，可以建模为具有共同的潜在结构。隐藏参数马尔可夫决策过程（HIP-MDP）明确对该结构进行了建模，以提高多任务设置的样本效率。但是，这种设置对国家在具有丰富观察空间的现实情况下限制其应用的状态的可观察性做出了强烈的假设。在这项工作中，我们利用HIP-MDP设置的共同结构的想法，并将其扩展以实现受块MDPS启发的稳健状态抽象。我们为多任务增强学习（MTRL）和Meta-Forminforcement Learning（Meta-RL）设置提供了这个新框架的实例化。此外，我们根据任务和状态相似性提供传输和概括界限，以及取决于任务跨任务数量的样本数量的样本复杂性界限，而不是任务数量，对使用相同环境假设的先前工作进行了重大改进。为了进一步证明所提出的方法的功效，我们从经验上比较并显示了对多任务和元提升学习基线的改进。

Many control tasks exhibit similar dynamics that can be modeled as having common latent structure. Hidden-Parameter Markov Decision Processes (HiP-MDPs) explicitly model this structure to improve sample efficiency in multi-task settings. However, this setting makes strong assumptions on the observability of the state that limit its application in real-world scenarios with rich observation spaces. In this work, we leverage ideas of common structure from the HiP-MDP setting, and extend it to enable robust state abstractions inspired by Block MDPs. We derive instantiations of this new framework for both multi-task reinforcement learning (MTRL) and meta-reinforcement learning (Meta-RL) settings. Further, we provide transfer and generalization bounds based on task and state similarity, along with sample complexity bounds that depend on the aggregate number of samples across tasks, rather than the number of tasks, a significant improvement over prior work that use the same environment assumptions. To further demonstrate the efficacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题