论文标题
使用梯度下降对通用恢复激活的不可知论学习
Agnostic Learning of General ReLU Activation Using Gradient Descent
论文作者
论文摘要
我们提供了梯度下降的收敛分析,以解决不可知的问题在高斯分布下具有中等偏置的单个relu函数。与研究零偏差的设置的先前工作不同,我们考虑了当relu函数的偏见非零时更具挑战性的情况。我们的主要结果确定,从随机初始化开始,以多项式的迭代梯度下降输出,具有很高的概率,一个relu函数,它实现了一个误差,该误差与中等偏置的最佳relu函数的最佳误差达到了恒定因子。我们还提供有限的样本保证,这些技术将其推广到高斯以外的更广泛的边际分布。
We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function with moderate bias under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves an error that is within a constant factor of the optimal error of the best ReLU function with moderate bias. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians.