论文标题
非线性动态和控制系统识别和策略蒸馏的层次分解
Hierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation
论文作者
论文摘要
非线性动力学系统的控制仍然是自治药物的主要挑战。当前的增强学习趋势(RL)集中在动态和政策的复杂表示上,这在解决各种硬控制任务方面取得了令人印象深刻的结果。但是,这种新的复杂性和极为过度的参数化模型取决于我们解释产生的政策能力的总体降低。在本文中,我们从控制社区中汲取灵感,并应用混合开关系统的原理,以将复杂的动态分解为更简单的组件。我们利用了概率图形模型的富代表力,并得出了学习序列模型以捕获数据的时间结构并自动将非线性动力学分解为随机开关线性动力学系统的期望最大化(EM)算法。此外,我们展示了这种切换模型的框架如何使在模仿学习方案中从非线性专家中提取马尔可夫和自动回归本地线性控制器的层次结构。
The control of nonlinear dynamical systems remains a major challenge for autonomous agents. Current trends in reinforcement learning (RL) focus on complex representations of dynamics and policies, which have yielded impressive results in solving a variety of hard control tasks. However, this new sophistication and extremely over-parameterized models have come with the cost of an overall reduction in our ability to interpret the resulting policies. In this paper, we take inspiration from the control community and apply the principles of hybrid switching systems in order to break down complex dynamics into simpler components. We exploit the rich representational power of probabilistic graphical models and derive an expectation-maximization (EM) algorithm for learning a sequence model to capture the temporal structure of the data and automatically decompose nonlinear dynamics into stochastic switching linear dynamical systems. Moreover, we show how this framework of switching models enables extracting hierarchies of Markovian and auto-regressive locally linear controllers from nonlinear experts in an imitation learning scenario.