论文标题
快速速率概括错误界限:主题的变化
Fast Rate Generalization Error Bounds: Variations on a Theme
论文作者
论文摘要
Russo和Xu发起的最新作品表明,学习算法的概括误差可以受到信息测量的上限。在大多数相关作品中,预期的概括误差的收敛速率是以O(sqrt {lambda/n})的形式,其中lambda是一些信息理论数量,例如数据样本和学习假设之间的共同信息。但是,与在许多学习情况下O(1/N)的“快速率”相比,这种学习率通常被认为是“慢”的。在这项工作中,我们首先表明平方根并不一定意味着速度缓慢,并且在适当的假设下,仍然可以使用此界限获得快速速率(O(1/n))结果。此外,我们确定了快速速率泛化误差所需的关键条件,我们称之为(ETA,C)中央条件。在这种情况下,我们为特定的学习算法(例如经验风险最小化)提供了有关概括误差和多余风险的信息理论界限,其收敛速率为O(λ/{n})。最后,给出了分析示例以显示边界的有效性。
A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(λ/{n}) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.