论文标题
量子多代理元加强学习
Quantum Multi-Agent Meta Reinforcement Learning
论文作者
论文摘要
尽管量子至上尚未到来,但最近在实用量子计算的迫在眉睫的时代,人们对确定量子机学习的潜力(QML)的兴趣越来越大。由此激励,在本文中,我们基于具有两个单独的可训练参数的单独维度的量子神经网络(QNN)的独特特征来重新设计多代理增强学习(MARL):影响输出Qubit状态的角度参数,以及与输出测量基础相关的POL参数。我们提出了将这种二元训练性作为元学习能力,我们提出了量子元marl(QM2ARL),该量子马尔(QM2ARL)首先应用角度训练元学习,然后进行极点训练,以进行几次射击或局部QNN培训。为了避免过度拟合,我们在角度训练期间开发了一种将噪声注入到杆域中的角度正则化技术。此外,通过将极点作为每个受过训练的QNN的内存地址利用,我们介绍了极点内存的概念,允许仅使用两参数极点值保存和加载经过训练的QNN。从理论上讲,我们证明了角度到极正则化下角度训练的收敛性,并通过模拟证实了QM2ARL在获得高奖励和快速收敛方面的有效性,以及在快速适应时间变化环境中的极点记忆。
Although quantum supremacy is yet to come, there has recently been an increasing interest in identifying the potential of quantum machine learning (QML) in the looming era of practical quantum computing. Motivated by this, in this article we re-design multi-agent reinforcement learning (MARL) based on the unique characteristics of quantum neural networks (QNNs) having two separate dimensions of trainable parameters: angle parameters affecting the output qubit states, and pole parameters associated with the output measurement basis. Exploiting this dyadic trainability as meta-learning capability, we propose quantum meta MARL (QM2ARL) that first applies angle training for meta-QNN learning, followed by pole training for few-shot or local-QNN training. To avoid overfitting, we develop an angle-to-pole regularization technique injecting noise into the pole domain during angle training. Furthermore, by exploiting the pole as the memory address of each trained QNN, we introduce the concept of pole memory allowing one to save and load trained QNNs using only two-parameter pole values. We theoretically prove the convergence of angle training under the angle-to-pole regularization, and by simulation corroborate the effectiveness of QM2ARL in achieving high reward and fast convergence, as well as of the pole memory in fast adaptation to a time-varying environment.