通过人类输入进行强大的非政策评估

论文标题

通过人类输入进行强大的非政策评估

Towards Robust Off-Policy Evaluation via Human Inputs

论文作者

Singh, Harvineet, Joshi, Shalmali, Doshi-Velez, Finale, Lakkaraju, Himabindu

论文摘要

非政策评估（OPE）方法是评估高风险领域（例如医疗保健）中的政策的关键工具，例如，直接部署通常是不可行的，不道德的或昂贵的。当期望部署环境发生变化（即数据集偏移）时，对于OPE方法，在此类更改中对策略进行稳健评估非常重要。现有的方法考虑了对可以任意改变环境可观察到的任何可观察到的大量转变的鲁棒性。这通常会导致对公用事业的高度悲观估计，从而使可能对部署有用的政策无效。在这项工作中，我们通过研究领域知识如何帮助提供对政策公用事业的更现实的估计来解决上述问题。我们利用人类的投入，在环境的哪些方面可能会发生变化，并调整OPE方法仅考虑对这些方面的转变。具体而言，我们提出了一个新颖的框架，可靠的OPE（绳索），该框架根据用户输入的数据在数据中的一个协变量上进行了变化，并估算了这些变化下最坏情况的效用。然后，我们为OPE开发了计算有效的算法，这些算法对上述强盗和马尔可夫决策过程的上述变化很强。我们还理论上分析了这些算法的样品复杂性。从医疗保健领域进行的合成和现实世界数据集进行了广泛的实验表明，我们的方法不仅可以捕获现实的数据集准确地转移，而且还会导致较少的悲观政策评估。

Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches consider robustness against a large class of shifts that can arbitrarily change any observable property of the environment. This often results in highly pessimistic estimates of the utilities, thereby invalidating policies that might have been useful in deployment. In this work, we address the aforementioned problem by investigating how domain knowledge can help provide more realistic estimates of the utilities of policies. We leverage human inputs on which aspects of the environments may plausibly change, and adapt the OPE methods to only consider shifts on these aspects. Specifically, we propose a novel framework, Robust OPE (ROPE), which considers shifts on a subset of covariates in the data based on user inputs, and estimates worst-case utility under these shifts. We then develop computationally efficient algorithms for OPE that are robust to the aforementioned shifts for contextual bandits and Markov decision processes. We also theoretically analyze the sample complexity of these algorithms. Extensive experimentation with synthetic and real world datasets from the healthcare domain demonstrates that our approach not only captures realistic dataset shifts accurately, but also results in less pessimistic policy evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题