量化学习和推论的相关性

论文标题

量化学习和推论的相关性

Quantifying Relevance in Learning and Inference

论文作者

Marsili, Matteo, Roudi, Yasser

论文摘要

学习是智能行为的独特特征。高通量实验数据和大数据有望在复杂系统（例如大脑或我们的社会）上打开新窗口。然而，人工智能和机器学习的令人困惑的成功表明，我们对学习的概念理解仍然很差。这些应用程序将统计推断推向了数据具有高维和稀缺的未知领域，并且有关“真实”模型的先前信息即使不是完全没有。在这里，我们根据“相关性”的概念回顾了理解学习的最新进展。当我们在此处定义的相关性量化了数据集或学习计算机内部表示的信息量，其中包含数据的生成模型。这使我们一方面可以定义最大信息的样本，另一方面定义最佳学习机。这些是在给定的分辨率（或压缩级别）下包含有关未知生成过程的最大信息的理想限制。这两个理想限制在统计意义上都表现出关键特征：最大信息的样本的特征是幂律频率分布（统计关键性）和最佳的学习机器，其特征是通过异常大的敏感性。分辨率（即压缩）和相关性之间的权衡区分了嘈杂表示的政权与有损压缩的政权。这些由以ZIPF法律统计数据为特征的特殊点隔开。这将遵守ZIPF定律的样本视为最大程度上最大的无损失表示形式。最佳学习机器中的临界性表现为能量水平的指数变性，从而导致异常的热力学特性。

Learning is a distinctive feature of intelligent behaviour. High-throughput experimental data and Big Data promise to open new windows on complex systems such as cells, the brain or our societies. Yet, the puzzling success of Artificial Intelligence and Machine Learning shows that we still have a poor conceptual understanding of learning. These applications push statistical inference into uncharted territories where data is high-dimensional and scarce, and prior information on "true" models is scant if not totally absent. Here we review recent progress on understanding learning, based on the notion of "relevance". The relevance, as we define it here, quantifies the amount of information that a dataset or the internal representation of a learning machine contains on the generative model of the data. This allows us to define maximally informative samples, on one hand, and optimal learning machines on the other. These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process, at a given resolution (or level of compression). Both ideal limits exhibit critical features in the statistical sense: Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility. The trade-off between resolution (i.e. compression) and relevance distinguishes the regime of noisy representations from that of lossy compression. These are separated by a special point characterised by Zipf's law statistics. This identifies samples obeying Zipf's law as the most compressed loss-less representations that are optimal in the sense of maximal relevance. Criticality in optimal learning machines manifests in an exponential degeneracy of energy levels, that leads to unusual thermodynamic properties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题