终生的机器学习功能组成结构

论文标题

终生的机器学习功能组成结构

Lifelong Machine Learning of Functionally Compositional Structures

论文作者

Mendez, Jorge A.

论文摘要

人类智力的标志是能够构建独立的知识块并将其重复使用以解决不同问题的新颖组合。由于基础组合搜索，学习这种组成结构一直是人造系统的挑战。迄今为止，对构图学习的研究已与终身或持续学习的工作分开进行。这项论文整合了这两条工作，以提出一个通用框架，用于终身学习功能组成结构。该框架将学习分为两个阶段：学习如何结合现有组件以吸收一个新的问题，并学习如何调整现有组件以解决新问题。这种分离明确处理了稳定性和灵活性之间的权衡。该论文将框架实例化成各种监督和加强学习（RL）算法。监督学习评估发现，1）组成模型改善了各种任务的终身学习，2）多阶段过程允许终身学习构图知识，而3）框架所学的组件代表了独立和可重复使用的功能。类似的RL评估表明，1）框架下的算法加速了高绩效策略的发现，而2）这些算法保留或提高了先前学习的任务的性能。论文将一个终生的构图RL算法扩展到了任务分布随时间变化的非组织设置，并发现模块化允许单独跟踪环境中不同元素的变化。本论文的最终贡献是组成RL的新基准，该基准表明现有方法难以发现环境的组成特性。

A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and reuse them in novel combinations for solving different problems. Learning such compositional structures has been a challenge for artificial systems, due to the underlying combinatorial search. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. This dissertation integrated these two lines of work to present a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to combine existing components to assimilate a novel problem, and learning how to adapt the existing components to accommodate the new problem. This separation explicitly handles the trade-off between stability and flexibility. This dissertation instantiated the framework into various supervised and reinforcement learning (RL) algorithms. Supervised learning evaluations found that 1) compositional models improve lifelong learning of diverse tasks, 2) the multi-stage process permits lifelong learning of compositional knowledge, and 3) the components learned by the framework represent self-contained and reusable functions. Similar RL evaluations demonstrated that 1) algorithms under the framework accelerate the discovery of high-performing policies, and 2) these algorithms retain or improve performance on previously learned tasks. The dissertation extended one lifelong compositional RL algorithm to the nonstationary setting, where the task distribution varies over time, and found that modularity permits individually tracking changes to different elements in the environment. The final contribution of this dissertation was a new benchmark for compositional RL, which exposed that existing methods struggle to discover the compositional properties of the environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题