重新审视离散的软演员评论

论文标题

重新审视离散的软演员评论

Revisiting Discrete Soft Actor-Critic

论文作者

Zhou, Haibin, Wei, Tong, Lin, Zichuan, li, junyou, Xing, Junliang, Shi, Yuanchun, Shen, Li, Yu, Chao, Ye, Deheng

论文摘要

我们研究了软参与者 - 批评（SAC）的适应性，该适应被认为是从连续的动作空间到离散的动作空间，被认为是最新的增强学习（RL）算法。我们重新访问香草离散囊，并在应用于离散设置时对其Q值低估和性能不稳定性问题提供深入的了解。因此，我们提出了稳定的离散SAC（SDSAC），该算法利用熵 - 元素和Q-CLIP的双平均Q学习来解决这些问题。对具有离散动作空间（包括Atari游戏和大型MOBA游戏）的典型基准测试的典型基准进行了广泛的实验，显示了我们提出的方法的功效。我们的代码位于：https：//github.com/coldsummerday/sd-sac.git。

We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题