异步课程经验重播：在未知动态环境中无人机自动运动控制的深入增强学习方法

论文标题

异步课程经验重播：在未知动态环境中无人机自动运动控制的深入增强学习方法

Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments

论文作者

Hu, Zijian, Gao, Xiaoguang, Wan, Kaifang, Wang, Qianglong, Zhai, Yiwei

论文摘要

无人驾驶汽车（UAV）已被广泛用于军事战。在本文中，我们将自动运动控制（AMC）问题作为马尔可夫决策过程（MDP）提出，并提出了一种先进的深度强化学习（DRL）方法，该方法允许无人机在大型动态三维（3D）环境中执行复杂的任务。为了克服优先体验重播（PER）算法的局限性并提高性能，提议的异步课程体验重播（ACER）使用多线程来异步更新优先级，分配了真实的优先级并应用了临时体验池，以使学习质量可用。还引入了第一个无用的体验池（FIUO）体验池，以确保存储的体验的更高使用价值。此外，与课程学习（CL）相结合，从简单到困难的抽样体验进行了更合理的培训范式，设计用于培训无人机。通过基于真实无人机的参数构建的复杂未知环境的训练，提出的ACER将收敛速度提高了24.66 \％，并且与最先进的双胞胎延迟延迟的深层确定性策略梯度（TD3）algorithm相比，收敛速度降低了5.59％。在具有不同复杂性的环境中进行的测试实验表明，ACER剂的鲁棒性和泛化能力。

Unmanned aerial vehicles (UAVs) have been widely used in military warfare. In this paper, we formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP) and propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments. To overcome the limitations of the prioritized experience replay (PER) algorithm and improve performance, the proposed asynchronous curriculum experience replay (ACER) uses multithreads to asynchronously update the priorities, assigns the true priorities and applies a temporary experience pool to make available experiences of higher quality for learning. A first-in-useless-out (FIUO) experience pool is also introduced to ensure the higher use value of the stored experiences. In addition, combined with curriculum learning (CL), a more reasonable training paradigm of sampling experiences from simple to difficult is designed for training UAVs. By training in a complex unknown environment constructed based on the parameters of a real UAV, the proposed ACER improves the convergence speed by 24.66\% and the convergence result by 5.59\% compared to the state-of-the-art twin delayed deep deterministic policy gradient (TD3) algorithm. The testing experiments carried out in environments with different complexities demonstrate the strong robustness and generalization ability of the ACER agent.

下载PDF全文

下载文献需遵守相关版权规定

论文标题