论文标题
贝叶斯优化增强了多uav网络中轨迹计划和网络形成的深度强化学习
Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks
论文作者
论文摘要
在本文中,我们采用了由基站(BS)协调的多个无人机来帮助地面用户(GUS)卸载其传感数据。不同的无人机可以调整其轨迹和网络形成,以通过多跳继电器加快数据传输。该轨迹计划旨在收集所有GUS的数据,而无人机网络组的形成优化了多跳的无人机网络拓扑,以最大程度地减少能源消耗和传输延迟。联合网络形成和轨迹优化是通过两步迭代方法来解决的。首先,我们通过使用启发式算法来平衡UAV的能耗和数据队列大小来设计自适应网络形成方案。然后,随着固定网络组的形成,通过使用多代理的深入强化学习,在不知道GUS的交通需求和空间分布的情况下,将进一步优化无人机的轨迹。为了提高学习效率,我们进一步采用贝叶斯优化来根据历史轨迹估算无人机的飞行决策。这有助于避免效率低下的动作探索并提高模型培训中的收敛速度。模拟结果揭示了无人机的轨迹计划与网络形成之间的紧密时空耦合。与几个基线相比,我们的解决方案可以更好地利用无人机在数据卸载方面的合作,从而提高能源效率和延迟性能。
In this paper, we employ multiple UAVs coordinated by a base station (BS) to help the ground users (GUs) to offload their sensing data. Different UAVs can adapt their trajectories and network formation to expedite data transmissions via multi-hop relaying. The trajectory planning aims to collect all GUs' data, while the UAVs' network formation optimizes the multi-hop UAV network topology to minimize the energy consumption and transmission delay. The joint network formation and trajectory optimization is solved by a two-step iterative approach. Firstly, we devise the adaptive network formation scheme by using a heuristic algorithm to balance the UAVs' energy consumption and data queue size. Then, with the fixed network formation, the UAVs' trajectories are further optimized by using multi-agent deep reinforcement learning without knowing the GUs' traffic demands and spatial distribution. To improve the learning efficiency, we further employ Bayesian optimization to estimate the UAVs' flying decisions based on historical trajectory points. This helps avoid inefficient action explorations and improves the convergence rate in the model training. The simulation results reveal close spatial-temporal couplings between the UAVs' trajectory planning and network formation. Compared with several baselines, our solution can better exploit the UAVs' cooperation in data offloading, thus improving energy efficiency and delay performance.