论文标题
Pigeonhole设计:从在线匹配的角度来平衡顺序实验
Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective
论文作者
论文摘要
从业者和学者长期以来一直赞赏协变量平衡的好处,当时他们进行了随机实验。但是,对于在线运行A/B测试的面向网络的公司,当实验受试者依次到达时,在平衡协变量信息方面仍然具有挑战性。在本文中,我们研究了一个在线实验设计问题,我们将其称为“在线阻塞问题”。在此问题中,具有异质协变量信息的实验受试者顺序到达,必须立即分配到对照组或处理组中。目的是最大程度地减少总差异,该差异定义为两组之间的最小重量完美匹配。为了解决这个问题,我们提出了实验的随机设计,我们称之为“ Pigonhole设计”。 Pigonhole设计首先将协变量空间划分为较小的空间,我们称之为鸽子洞,然后,当实验受试者到达每个鸽子孔时,平衡了每个Pigonhole的控制次数和治疗受试者。我们分析了Pigonhole设计的理论性能,并通过与两种众所周知的基准设计进行比较:比赛对设计和完全随机的设计。我们确定鸽子设计比基准设计更大的好处时的方案。总而言之,我们使用Yahoo!进行了广泛的模拟!如果我们使用Pigonhole设计来估计平均治疗效果,则数据显示差异降低了10.2%。
Practitioners and academics have long appreciated the benefits of covariate balancing when they conduct randomized experiments. For web-facing firms running online A/B tests, however, it still remains challenging in balancing covariate information when experimental subjects arrive sequentially. In this paper, we study an online experimental design problem, which we refer to as the "Online Blocking Problem." In this problem, experimental subjects with heterogeneous covariate information arrive sequentially and must be immediately assigned into either the control or the treated group. The objective is to minimize the total discrepancy, which is defined as the minimum weight perfect matching between the two groups. To solve this problem, we propose a randomized design of experiment, which we refer to as the "Pigeonhole Design." The pigeonhole design first partitions the covariate space into smaller spaces, which we refer to as pigeonholes, and then, when the experimental subjects arrive at each pigeonhole, balances the number of control and treated subjects for each pigeonhole. We analyze the theoretical performance of the pigeonhole design and show its effectiveness by comparing against two well-known benchmark designs: the match-pair design and the completely randomized design. We identify scenarios when the pigeonhole design demonstrates more benefits over the benchmark design. To conclude, we conduct extensive simulations using Yahoo! data to show a 10.2% reduction in variance if we use the pigeonhole design to estimate the average treatment effect.