论文标题
重新审视离散的软演员评论
Revisiting Discrete Soft Actor-Critic
论文作者
论文摘要
我们研究了软参与者 - 批评(SAC)的适应性,该适应被认为是从连续的动作空间到离散的动作空间,被认为是最新的增强学习(RL)算法。我们重新访问香草离散囊,并在应用于离散设置时对其Q值低估和性能不稳定性问题提供深入的了解。因此,我们提出了稳定的离散SAC(SDSAC),该算法利用熵 - 元素和Q-CLIP的双平均Q学习来解决这些问题。对具有离散动作空间(包括Atari游戏和大型MOBA游戏)的典型基准测试的典型基准进行了广泛的实验,显示了我们提出的方法的功效。我们的代码位于:https://github.com/coldsummerday/sd-sac.git。
We study the adaption of Soft Actor-Critic (SAC), which is considered as a state-of-the-art reinforcement learning (RL) algorithm, from continuous action space to discrete action space. We revisit vanilla discrete SAC and provide an in-depth understanding of its Q value underestimation and performance instability issues when applied to discrete settings. We thereby propose Stable Discrete SAC (SDSAC), an algorithm that leverages entropy-penalty and double average Q-learning with Q-clip to address these issues. Extensive experiments on typical benchmarks with discrete action space, including Atari games and a large-scale MOBA game, show the efficacy of our proposed method. Our code is at: https://github.com/coldsummerday/SD-SAC.git.