论文标题

通过增强的勘探效率优化近端政策优化

Proximal Policy Optimization via Enhanced Exploration Efficiency

论文作者

Zhang, Junwei, Zhang, Zhenghao, Han, Shuai, Lü, Shuai

论文摘要

近端策略优化(PPO)算法是一种具有出色性能的深入增强算法,尤其是在连续的控制任务中。但是这种方法的性能仍然受其探索能力的影响。对于经典的增强学习,有些方案使探索更加完整,与数据开发相平衡,但是由于算法的复杂性,它们无法在复杂的环境中应用。基于持续的控制任务,并具有密集的奖励,本文分析了PPO算法中原始高斯动作探索机制的假设,并阐明了勘探能力对性能的影响。之后,针对勘探问题,本文设计了基于不确定性估计的探索增强机制。然后,我们将探索增强理论应用于PPO算法,并通过内在的探索模块(IEM-PPO)提出近端策略优化算法,该算法可用于复杂的环境。在实验部件中,我们评估了有关Mujoco物理模拟器的多个任务的方法,并将IEM-PPO算法与好奇心驱动的探索算法(ICM-PPPO)和原始算法(PPO)进行比较。实验结果表明,IEM-PPO算法需要更长的训练时间,但在样本效率和累积奖励方面的表现更好,并且具有稳定性和鲁棒性。

Proximal policy optimization (PPO) algorithm is a deep reinforcement learning algorithm with outstanding performance, especially in continuous control tasks. But the performance of this method is still affected by its exploration ability. For classical reinforcement learning, there are some schemes that make exploration more full and balanced with data exploitation, but they can't be applied in complex environments due to the complexity of algorithm. Based on continuous control tasks with dense reward, this paper analyzes the assumption of the original Gaussian action exploration mechanism in PPO algorithm, and clarifies the influence of exploration ability on performance. Afterward, aiming at the problem of exploration, an exploration enhancement mechanism based on uncertainty estimation is designed in this paper. Then, we apply exploration enhancement theory to PPO algorithm and propose the proximal policy optimization algorithm with intrinsic exploration module (IEM-PPO) which can be used in complex environments. In the experimental parts, we evaluate our method on multiple tasks of MuJoCo physical simulator, and compare IEM-PPO algorithm with curiosity driven exploration algorithm (ICM-PPO) and original algorithm (PPO). The experimental results demonstrate that IEM-PPO algorithm needs longer training time, but performs better in terms of sample efficiency and cumulative reward, and has stability and robustness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源