论文标题
神经斯坦因评论家,分期$ l^2 $ regulination
Neural Stein critics with staged $L^2$-regularization
论文作者
论文摘要
学习将模型分布与观察到的数据区分开来是统计和机器学习中的一个基本问题,而高维数据仍然是这些问题的挑战性环境。量化概率分布中差异(例如Stein差异)在高维统计测试中起重要作用的指标。在本文中,我们调查了$ l^2 $正则化在训练神经网络批评家中的作用,以区分未知概率分布和名义模型分布的数据。与神经切线内核(NTK)理论建立联系,我们开发了一种新颖的分期程序,以实现正规化比训练时间的重量,这在早期利用了高度调节的训练的优势。从理论上讲,我们证明了内核优化的训练动态近似,即``懒训练'',当$ l^2 $正则化重量很大时,$ n $样品的培训以$ {o}的速率(n^{ - 1/2}})汇聚为$。结果确保学习最佳评论家,假设与零时NTK的领先特征模式足够一致。在模拟的高维数据和评估图像数据生成模型的应用中,证明了分期$ l^2 $正则化的好处。
Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large, and training on $n$ samples converge at a rate of ${O}(n^{-1/2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.