论文标题

神经网络和乔姆斯基层次结构

Neural Networks and the Chomsky Hierarchy

论文作者

Delétang, Grégoire, Ruoss, Anian, Grau-Moya, Jordi, Genewein, Tim, Wenliang, Li Kevin, Catt, Elliot, Cundy, Chris, Hutter, Marcus, Legg, Shane, Veness, Joel, Ortega, Pedro A.

论文摘要

可靠的概括是安全ML和AI的核心。但是,了解神经网络何时以及如何推广仍然是该领域最重要的未解决问题之一。在这项工作中,我们进行了一项广泛的实证研究(20'910模型,15个任务),以研究计算理论中的见解是否可以预测实践中神经网络概括的局限性。我们证明,根据Chomsky层次结构进行分组任务使我们可以预测某些架构是否能够推广到分布外输入。这包括负面结果,即使大量数据和训练时间也不会导致任何非平凡的概括,尽管模型具有足够的能力完美地适合培训数据。我们的结果表明,对于我们的任务子集,RNN和变形金刚无法推广到非规范任务,LSTMS可以求解常规和反语言任务,并且只有通过结构化内存(例如堆栈或存储器磁带)增强的网络可以成功地在上下文和上下文敏感的任务上成功概括。

Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源