论文标题

通用学习理论

A Theory of Universal Learning

论文作者

Bousquet, Olivier, Hanneke, Steve, Moran, Shay, van Handel, Ramon, Yehudayoff, Amir

论文摘要

从示例中学到的一类概念可以多快?通常,通常通过绘制其“学习曲线”(即错误率的衰减)来衡量监督机器学习算法的性能。但是,理解可学习性的经典理论框架,即Vapnik-Chervonenkis和Valiant的PAC模型,并不能解释学习曲线的行为:无分布的PAC学习模型只能绑定学习曲线的上层信封上所有可能的数据分布的信封。这与机器学习的实践不符,在任何给定的情况下,数据源通常固定,而学习者可以根据计算资源等因素和所需的准确性选择培训示例的数量。 在本文中,我们研究了一种替代学习模型,该模型可以更好地捕捉机器学习的实用方面,但仍会引起完整的理论理论,以PAC模型的精神学习。更确切地说,我们考虑了通用学习的问题,该问题旨在了解每个数据分布上学习算法的性能,而无需在分布上统一性。本文的主要结果是一个了不起的三分法:普遍学习的速度只有三个。更确切地说,我们表明,以指数性,线性或任意缓慢的速度衰减的任何给定概念类衰减的学习曲线。此外,这些情况中的每一个都完全以适当的组合参数为特征,并且我们展示了在每种情况下达到最佳速率的最佳学习算法。 对于具体性,我们在本文中仅考虑可实现的情况,尽管预计结果将扩展到更通用的学习情况。

How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源