论文标题

平均奖励的不安匪徒的基于Whittle Index的Q学习

Whittle index based Q-learning for restless bandits with average reward

论文作者

Avrachenkov, Konstantin E., Borkar, Vivek S.

论文摘要

使用Q-Learning和Whittle Index的范式引入了一种新颖的增强学习算法,该算法是针对平均奖励的多型不安强盗的。具体而言,我们利用Whittle索引策略的结构来减少Q学习的搜索空间,从而导致重大计算增长。提供了严格的合并分析,并由数值实验支持。数值实验显示了拟议方案的出色经验表现。

A novel reinforcement learning algorithm is introduced for multiarmed restless bandits with average reward, using the paradigms of Q-learning and Whittle index. Specifically, we leverage the structure of the Whittle index policy to reduce the search space of Q-learning, resulting in major computational gains. Rigorous convergence analysis is provided, supported by numerical experiments. The numerical experiments show excellent empirical performance of the proposed scheme.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源