论文标题

监督学习的信息理论框架

An Information-Theoretic Framework for Supervised Learning

论文作者

Jeon, Hong Jun, Zhu, Yifan, Van Roy, Benjamin

论文摘要

每年,深度学习都会通过更深层和更广泛的神经网络展示新的和改进的经验结果。同时,对于现有的理论框架,很难在不诉诸于计数参数或遇到深度指数的样本复杂性范围的情况下,比两层更深地分析网络。尝试在不同的镜头下分析现代机器学习可能会有所富有成效。在本文中,我们提出了一个新颖的信息理论框架,其遗憾和样本复杂性,以分析机器学习的数据要求。通过我们的框架,我们首先通过一些经典的示例进行工作,例如标量估计和线性回归,以构建直觉并引入一般技术。然后,我们使用该框架来研究由具有Relu激活单元的深神经网络生成的数据的样本复杂性。对于重量上的特定先验分布,我们建立了样本复杂性界限,它们同时宽度独立且深度为线性。这种先前的分布产生了高维的潜在表示,高概率承认相当准确的低维近似值。最后,我们通过对随机单层神经网络的实验分析来证实我们的理论结果。

Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth. This prior distribution gives rise to high-dimensional latent representations that, with high probability, admit reasonably accurate low-dimensional approximations. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源