组合因果土匪

论文标题

组合因果土匪

Combinatorial Causal Bandits

论文作者

Feng, Shi, Chen, Wei

论文摘要

在组合因果土匪（CCB）中，学习代理在每轮中最多选择$ k $变量进行干预，从观察到的变量中收集反馈，目的是最大程度地减少对目标变量$ y $的预期遗憾。我们在因果模型的简洁参数表示的二元广义线性模型（BGLM）的背景下进行研究。我们根据最大似然估计方法提供了Markovian BGLMS（即没有隐藏变量）的算法BGLM-OFU，并表明它可以实现$ O（\ sqrt {t} \ log t）$遗憾，其中$ t $是$ t $的时间范围。对于具有隐藏变量的线性模型的特殊情况，我们应用因果推理技术，例如DO-Calculus将原始模型转换为马尔可夫模型，然后证明我们的BGLM OFFU算法和另一种基于其他具有隐藏变量模型的线性求解的基于线性回归的算法。 Our novelty includes (a) considering the combinatorial intervention action space and the general causal models including ones with hidden variables, (b) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (c) avoiding unrealistic assumptions (such as knowing the joint distribution of the parents of $Y$ under all interventions) and regret factors exponential to causal graph size in prior studies.

In combinatorial causal bandits (CCB), the learning agent chooses at most $K$ variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable $Y$. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e. no hidden variables) based on the maximum likelihood estimation method, and show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time horizon. For the special case of linear models with hidden variables, we apply causal inference techniques such as the do-calculus to convert the original model into a Markovian model, and then show that our BGLM-OFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space and the general causal models including ones with hidden variables, (b) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (c) avoiding unrealistic assumptions (such as knowing the joint distribution of the parents of $Y$ under all interventions) and regret factors exponential to causal graph size in prior studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题