minimax游戏中的递归推理：$ k $渐变播放方法

论文标题

minimax游戏中的递归推理：$ k $渐变播放方法

Recursive Reasoning in Minimax Games: A Level $k$ Gradient Play Method

论文作者

Liu, Zichu, Pavel, Lacra

论文摘要

尽管生成的对抗网络（GAN）在产生视觉上吸引人的图像方面取得了成功，但众所周知，它们在训练中却具有挑战性。为了稳定Minimax游戏中的学习动力，我们提出了一种新颖的递归推理算法：级别$ k $渐变播放（lv。$ $ k $ gp）算法。与许多现有算法相反，我们的算法不需要复杂的启发式方法或曲率信息。我们表明，随着$ k $的增加，$ k $ gp渐近地收集到对玩家未来策略的准确估计。此外，我们证明LV。$ \ infty $ gp自然会概括一系列依赖预测更新的可融合游戏动力。此外，我们在非Convex-Nonconcave Zero-SUM游戏中提供其本地收敛属性，以及双线性和二次游戏中的全球融合。通过将LV。$ K $ GP与Adam Optimizer相结合，我们的算法与其他方法相比，在性能和计算间接费用方面具有明显的优势。在CIFAR-10上，使用单个NVIDIA RTX3090 GPU和BIGGAN的参数少30倍，我们在30小时内实现了无条件图像生成的FID为10.17，从而可以在常见的计算资源上进行GAN培训，以达到最新的效果。

Despite the success of generative adversarial networks (GANs) in generating visually appealing images, they are notoriously challenging to train. In order to stabilize the learning dynamics in minimax games, we propose a novel recursive reasoning algorithm: Level $k$ Gradient Play (Lv.$k$ GP) algorithm. In contrast to many existing algorithms, our algorithm does not require sophisticated heuristics or curvature information. We show that as $k$ increases, Lv.$k$ GP converges asymptotically towards an accurate estimation of players' future strategy. Moreover, we justify that Lv.$\infty$ GP naturally generalizes a line of provably convergent game dynamics which rely on predictive updates. Furthermore, we provide its local convergence property in nonconvex-nonconcave zero-sum games and global convergence in bilinear and quadratic games. By combining Lv.$k$ GP with Adam optimizer, our algorithm shows a clear advantage in terms of performance and computational overhead compared to other methods. Using a single Nvidia RTX3090 GPU and 30 times fewer parameters than BigGAN on CIFAR-10, we achieve an FID of 10.17 for unconditional image generation within 30 hours, allowing GAN training on common computational resources to reach state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题