论文标题
高通量同步深度RL
High-Throughput Synchronous Deep RL
论文作者
论文摘要
深度强化学习(RL)在计算上是要求的,需要处理许多数据点。同步方法享受训练稳定性,同时具有较低的数据吞吐量。相反,异步方法达到了高吞吐量,但由于“陈旧的政策”而遭受稳定性问题和样本效率的较低。为了结合两种方法的优势,我们提出了高通量同步深钢筋学习(HTS-RL)。在HTS-RL中,我们同时执行学习和推出,设计一种系统设计,该设计避免了“陈旧的政策”,并确保演员以异步的方式与环境复制品相互作用,同时保持完整的确定性。我们评估了关于Atari游戏和Google研究足球环境的方法。与同步基线相比,HTS-RL更快地为2-6 $ \ times $。与最新的异步方法相比,HTS-RL具有竞争性的吞吐量,并且始终获得更高的平均发作奖励。
Deep reinforcement learning (RL) is computationally demanding and requires processing of many data points. Synchronous methods enjoy training stability while having lower data throughput. In contrast, asynchronous methods achieve high throughput but suffer from stability issues and lower sample efficiency due to `stale policies.' To combine the advantages of both methods we propose High-Throughput Synchronous Deep Reinforcement Learning (HTS-RL). In HTS-RL, we perform learning and rollouts concurrently, devise a system design which avoids `stale policies' and ensure that actors interact with environment replicas in an asynchronous manner while maintaining full determinism. We evaluate our approach on Atari games and the Google Research Football environment. Compared to synchronous baselines, HTS-RL is 2-6$\times$ faster. Compared to state-of-the-art asynchronous methods, HTS-RL has competitive throughput and consistently achieves higher average episode rewards.