论文标题
基于多代理深入学习的基于深入学习的资源管理启用了H2H/M2M共存的蜂窝网络
Multi-Agent Deep Reinforcement Learning Based Resource Management in SWIPT Enabled Cellular Networks with H2H/M2M Co-Existence
论文作者
论文摘要
机器对机器(M2M)通信对于开发物联网(IoT)至关重要。众所周知,蜂窝网络已被视为M2M通信的主要基础架构,因此需要解决几个关键问题,以便通过蜂窝网络部署M2M通信。值得注意的是,M2M流量的迅速增长显着增加了能源消耗,并降低了现有的人类对人类(H2H)的性能。可持续的运营技术和资源管理是解决这些问题的有效方法。在本文中,我们研究了H2H/M2M共存的蜂窝网络中的资源管理问题。首先,考虑到机器类型通信设备(MTCD)的能源约束性质,我们提出了一种通过同时无线信息和电力传输(SWIPT)启用的新型网络模型,该模型赋予MTCD,并能够同时执行能源收集(EH)(EH)和信息解码。鉴于IoT设备的不同特征,我们将MTCD细分为关键和可耐受的类型,进一步将资源管理问题作为能源效率(EE)最大化问题在潜水质量服务(QOS)约束下的最大化问题。然后,我们开发了基于多代理的深入增强学习(DRL)方案来解决此问题。它提供了最佳的频谱,传输功率和功率拆分(PS)比率分配策略,以及在设计基于行为跟踪的状态空间和共同奖励功能下有效的模型培训。最后,我们通过合理的培训机制来验证,多个M2M代理以分布式的方式成功合作,从而导致网络性能在融合速度和满足EE和QoS要求方面超过其他智能方法。
Machine-to-Machine (M2M) communication is crucial in developing Internet of Things (IoT). As it is well known that cellular networks have been considered as the primary infrastructure for M2M communications, there are several key issues to be addressed in order to deploy M2M communications over cellular networks. Notably, the rapid growth of M2M traffic dramatically increases energy consumption, as well as degrades the performance of existing Human-to-Human (H2H) traffic. Sustainable operation technology and resource management are efficacious ways for solving these issues. In this paper, we investigate a resource management problem in cellular networks with H2H/M2M coexistence. First, considering the energy-constrained nature of machine type communication devices (MTCDs), we propose a novel network model enabled by simultaneous wireless information and power transfer (SWIPT), which empowers MTCDs with the ability to simultaneously perform energy harvesting (EH) and information decoding. Given the diverse characteristics of IoT devices, we subdivide MTCDs into critical and tolerable types, further formulating the resource management problem as an energy efficiency (EE) maximization problem under divers Quality-of-Service (QoS) constraints. Then, we develop a multi-agent deep reinforcement learning (DRL) based scheme to solve this problem. It provides optimal spectrum, transmit power and power splitting (PS) ratio allocation policies, along with efficient model training under designed behaviour-tracking based state space and common reward function. Finally, we verify that with a reasonable training mechanism, multiple M2M agents successfully work cooperatively in a distributed way, resulting in network performance that outperforms other intelligence approaches in terms of convergence speed and meeting the EE and QoS requirements.