自适应估计器选择用于非政策评估

论文标题

自适应估计器选择用于非政策评估

Adaptive Estimator Selection for Off-Policy Evaluation

论文作者

Su, Yi, Srinath, Pavithra, Krishnamurthy, Akshay

论文摘要

我们开发了一种通用数据驱动方法，用于在非政策策略评估设置中选择估计器。我们为该方法建立了强大的性能保证，表明它与Oracle估计器具有竞争力，最高因素。通过在上下文匪徒和强化学习中进行深入的案例研究，我们证明了该方法的一般性和适用性。我们还进行了全面的实验，证明了我们方法的经验疗效，并与相关方法进行了比较。在这两种案例研究中，我们的方法都与现有方法相比。

We develop a generic data-driven method for estimator selection in off-policy policy evaluation settings. We establish a strong performance guarantee for the method, showing that it is competitive with the oracle estimator, up to a constant factor. Via in-depth case studies in contextual bandits and reinforcement learning, we demonstrate the generality and applicability of the method. We also perform comprehensive experiments, demonstrating the empirical efficacy of our approach and comparing with related approaches. In both case studies, our method compares favorably with existing methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题