从控制理论视图下降的步进尺寸下的异步Q学习的有限时间分析

论文标题

从控制理论视图下降的步进尺寸下的异步Q学习的有限时间分析

Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View

论文作者

Lim, Han-Dong, Lee, Donghwan

论文摘要

Q学习长期以来一直是最受欢迎的强化学习算法之一，几十年来，Q-学习的理论分析一直是一个活跃的研究主题。尽管对Q-学习的渐近收敛分析的研究具有悠久的传统，但非反应收敛性直到最近才进行积极研究。本文的主要目的是通过控制系统的观点研究马尔可夫观察模型下异步Q学习的新有限时间分析。 In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to \(\mathcal{O}\left( \sqrt{\frac{\log k}{k}} \right)\) convergence rate that is comparable to或比大多数最新状态都更好。同时，新应用了使用类似转换的技术，以避免通过减小的步进大小带来的分析中的难度。提出的分析带来了其他见解，涵盖了不同的方案，并提供了新的简化模板，以通过其与离散时间切换系统的独特联系来加深我们对Q学习的理解。

Q-learning has long been one of the most popular reinforcement learning algorithms, and theoretical analysis of Q-learning has been an active research topic for decades. Although researches on asymptotic convergence analysis of Q-learning have a long tradition, non-asymptotic convergence has only recently come under active study. The main goal of this paper is to investigate new finite-time analysis of asynchronous Q-learning under Markovian observation models via a control system viewpoint. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to \(\mathcal{O}\left( \sqrt{\frac{\log k}{k}} \right)\) convergence rate that is comparable to or better than most of the state of the art results in the literature. In the mean while, a technique using the similarly transformation is newly applied to avoid the difficulty in the analysis posed by diminishing step-sizes. The proposed analysis brings in additional insights, covers different scenarios, and provides new simplified templates for analysis to deepen our understanding on Q-learning via its unique connection to discrete-time switching systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题