老化的匪徒：无线网络的遗憾分析和订单最佳学习算法，随机到达

论文标题

老化的匪徒：无线网络的遗憾分析和订单最佳学习算法，随机到达

Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals

论文作者

Atay, Eray Unsal, Kadota, Igor, Modiano, Eytan

论文摘要

我们考虑了一个单跳无线网络，该网络具有来源将时间敏感信息传输到目的地，以多个不可靠的渠道传输到目的地。每个源的数据包是根据具有已知统计数据的随机过程生成的，每个无线通道的状态（ON/OFF）根据具有未知统计的随机过程而变化。无线通道的可靠性应通过观察来学习。在每次插槽中，学习算法都会选择一个对（源，通道），并且所选源尝试通过所选通道传输其数据包。成功传输到目的地的概率取决于所选通道的可靠性。学习算法的目的是将网络中的信息年龄（AOI）最小化，超过$ t $ time插槽。为了分析学习算法的性能，我们介绍了AOI遗憾的概念，这是所考虑的学习算法的预期累积AOI与所考虑的Genie算法的预期累积AOI之间的差异，该算法知道频道的可靠性先前。 AOI遗憾捕获了必须在$ t $时间段内学习渠道的统计数据所产生的罚款。结果是两个方面：首先，我们考虑学习对随机多军匪徒问题（例如$ε$ - 果酱，上限限制和汤普森采样）采用众所周知的解决方案的学习算法，并表明他们的AOI遗憾量表为$θ（\ log t）$;其次，我们开发了一种新颖的学习算法，并表明它具有$ O（1）$遗憾。据我们所知，这是第一个具有有限AOI遗憾的学习算法。

We consider a single-hop wireless network with sources transmitting time-sensitive information to the destination over multiple unreliable channels. Packets from each source are generated according to a stochastic process with known statistics and the state of each wireless channel (ON/OFF) varies according to a stochastic process with unknown statistics. The reliability of the wireless channels is to be learned through observation. At every time slot, the learning algorithm selects a single pair (source, channel) and the selected source attempts to transmit its packet via the selected channel. The probability of a successful transmission to the destination depends on the reliability of the selected channel. The goal of the learning algorithm is to minimize the Age-of-Information (AoI) in the network over $T$ time slots. To analyze the performance of the learning algorithm, we introduce the notion of AoI regret, which is the difference between the expected cumulative AoI of the learning algorithm under consideration and the expected cumulative AoI of a genie algorithm that knows the reliability of the channels a priori. The AoI regret captures the penalty incurred by having to learn the statistics of the channels over the $T$ time slots. The results are two-fold: first, we consider learning algorithms that employ well-known solutions to the stochastic multi-armed bandit problem (such as $ε$-Greedy, Upper Confidence Bound, and Thompson Sampling) and show that their AoI regret scales as $Θ(\log T)$; second, we develop a novel learning algorithm and show that it has $O(1)$ regret. To the best of our knowledge, this is the first learning algorithm with bounded AoI regret.

下载PDF全文

下载文献需遵守相关版权规定

论文标题