将公式作为深网：零摄像的强化学习LTL公式执行

论文标题

将公式作为深网：零摄像的强化学习LTL公式执行

Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas

论文作者

Kuo, Yen-Ling, Katz, Boris, Barbu, Andrei

论文摘要

我们展示了一种增强学习剂，该学习代理使用组成复发性神经网络，该神经网络将其作为输入LTL公式并确定令人满意的动作。输入LTL公式以前从未见过，但是网络执行零弹性概括以满足它们。这是RL代理的一种新颖的多任务学习形式，代理商从一组任务中学习并推广到一组新的不同任务。网络的公式使这种能力能够概括。我们在两个领域中演示了这种能力。在符号域中，代理找到一系列被接受的字母。在类似于我的Minecraft环境中，该代理找到了一系列符合公式的动作。尽管先前的工作可以学会可靠地执行该公式的示例，但我们演示了如何可靠地编码所有公式。这可能构成新的多任务代理的基础，这些多任务代理会发现子任务并执行它们，而无需任何其他培训，以及遵循更复杂语言命令的代理。这种概括所需的结构是特定于LTL公式的，这打开了一个有趣的理论问题：在神经网络中需要哪些结构才能零弹性概括到不同的逻辑？

We demonstrate a reinforcement learning agent which uses a compositional recurrent neural network that takes as input an LTL formula and determines satisfying actions. The input LTL formulas have never been seen before, yet the network performs zero-shot generalization to satisfy them. This is a novel form of multi-task learning for RL agents where agents learn from one diverse set of tasks and generalize to a new set of diverse tasks. The formulation of the network enables this capacity to generalize. We demonstrate this ability in two domains. In a symbolic domain, the agent finds a sequence of letters that is accepted. In a Minecraft-like environment, the agent finds a sequence of actions that conform to the formula. While prior work could learn to execute one formula reliably given examples of that formula, we demonstrate how to encode all formulas reliably. This could form the basis of new multitask agents that discover sub-tasks and execute them without any additional training, as well as the agents which follow more complex linguistic commands. The structures required for this generalization are specific to LTL formulas, which opens up an interesting theoretical question: what structures are required in neural networks for zero-shot generalization to different logics?

下载PDF全文

下载文献需遵守相关版权规定

论文标题