基于有限的反馈代码手册的深度强化学习基于学习的自适应IRS控制

论文标题

基于有限的反馈代码手册的深度强化学习基于学习的自适应IRS控制

Deep Reinforcement Learning-Based Adaptive IRS Control with Limited Feedback Codebooks

论文作者

Kim, Junghoon, Hosseinalipour, Seyyedali, Marcum, Andrew C., Kim, Taejoon, Love, David J., Brinton, Christopher G.

论文摘要

智能反射表面（IRS）由可配置的元原子组成，可以通过设计其反射系数来改变无线传播环境。我们考虑在实践环境中进行适应性IRS控制，在该环境中，（i）通过调整元素中嵌入的可调元素来实现IRS反射系数，（ii）IRS反射系数会受到IRS的入射角度的影响（III），（III）来自Time-Path and-path and-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-vary-v。（BS）到IRS的数据速率较低。由于通道估计的难度和反馈通道的数据速率较低，因此依赖渠道估计并将优化变量传达给IRS的常规优化IR控制协议不实用。为了应对这些挑战，我们开发了一种基于自适应代码的新型有限反馈协议来控制IRS。我们提出了两种用于自适应IRS代码书设计的解决方案：（i）随机邻接（RA），该解决方案（RA）利用了整个频道实现的相关性，以及（ii）基于深度的增强性学习的深度神经网络政策控制（DPIC）。数值评估表明，拟议方案在一个相干时间内的数据速率和平均数据速率大大提高。

Intelligent reflecting surfaces (IRS) consist of configurable meta-atoms, which can alter the wireless propagation environment through design of their reflection coefficients. We consider adaptive IRS control in the practical setting where (i) the IRS reflection coefficients are attained by adjusting tunable elements embedded in the meta-atoms, (ii) the IRS reflection coefficients are affected by the incident angles of the incoming signals, (iii) the IRS is deployed in multi-path, time-varying channels, and (iv) the feedback link from the base station (BS) to the IRS has a low data rate. Conventional optimization-based IRS control protocols, which rely on channel estimation and conveying the optimized variables to the IRS, are not practical in this setting due to the difficulty of channel estimation and the low data rate of the feedback channel. To address these challenges, we develop a novel adaptive codebook-based limited feedback protocol to control the IRS. We propose two solutions for adaptive IRS codebook design: (i) random adjacency (RA), which utilizes correlations across the channel realizations, and (ii) deep neural network policy-based IRS control (DPIC), which is based on a deep reinforcement learning. Numerical evaluations show that the data rate and average data rate over one coherence time are improved substantially by the proposed schemes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题