论文标题
学习有效地功能近似于上下文MDP
Learning Efficiently Function Approximation for Contextual MDP
论文作者
论文摘要
我们使用奖励和动力学的函数近似研究学习上下文MDP。我们认为动态取决于或独立于上下文的情况。对于这两个模型,我们都会得出多项式样本和时间复杂性(假设有效的ERM Oracle)。我们的方法使从学习上下文的MDP总体上减少了监督学习。
We study learning contextual MDPs using a function approximation for both the rewards and the dynamics. We consider both the case that the dynamics dependent or independent of the context. For both models we derive polynomial sample and time complexity (assuming an efficient ERM oracle). Our methodology gives a general reduction from learning contextual MDP to supervised learning.