矩阵上下文的在线统计推论

论文标题

矩阵上下文的在线统计推论

Online Statistical Inference in Decision-Making with Matrix Context

论文作者

Han, Qiyu, Sun, Will Wei, Zhang, Yichen

论文摘要

利用上下文信息的在线决策问题的研究引起了人们的关注，因为它们在医疗保健到自主系统等领域的重要应用。在现代应用中，上下文信息可以丰富，并且通常被表示为矩阵。此外，虽然现有的在线决策算法主要集中于奖励最大化，但较少的关注量投入了统计推断。为了解决这些差距，在这项工作中，我们考虑了一个在线决策问题，其中矩阵上下文具有真实模型参数的结构低。我们提出了一个完整的在线程序，以使用适应性收集的数据进行统计推断。模型参数的低排列结构和数据收集过程的自适应性质使这一困难：标准的低级别估计器是偏见的，无法以顺序的方式获得，而顺序决策算法中现有的推理方法则无法解释低含量，并且也偏置。为了克服这些挑战，我们引入了一个新的在线偏见程序，以同时处理这两个偏见。我们的推理框架包括参数推理和最佳策略值推理。从理论上讲，我们建立了所提出的在线依据估计量的渐近正态性，并证明了这两个推理任务的构建置信区间的有效性。我们的推论结果是建立在新开发的低级随机梯度下降估计量及其收敛结果的基础上的，它们也具有独立的兴趣。

The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual information can be rich and is often represented as a matrix. Moreover, while existing online decision algorithms mainly focus on reward maximization, less attention has been devoted to statistical inference. To address these gaps, in this work, we consider an online decision-making problem with a matrix context where the true model parameters have a low-rank structure. We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptive nature of the data collection process make this difficult: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To overcome these challenges, we introduce a new online debiasing procedure to simultaneously handle both sources of bias. Our inference framework encompasses both parameter inference and optimal policy value inference. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its convergence result, which are also of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题