对抗性模仿使用州观察员从视频中学习

论文标题

对抗性模仿使用州观察员从视频中学习

Adversarial Imitation Learning from Video using a State Observer

论文作者

Karnan, Haresh, Warnell, Garrett, Torabi, Faraz, Stone, Peter

论文摘要

模仿学习研究社区最近取得了重大进展，目的是使人造代理仅通过视频演示模仿行为。但是，由于视频观察的高维质，因此针对此问题开发的当前最新方法表现出很高的样本复杂性。为了解决这个问题，我们在这里介绍了一种新的算法，称为使用状态观察者VGAIFO-SO从观察中获得的，称为视觉生成对抗性模仿。 Vgaifo-so的核心试图使用一种新颖的，自我监管的状态观察者来解决样本效率低下，该观察者从高维图像中提供了较低维度的前置状态表示的估计。我们在几个连续的控制环境中进行了实验表明，VGAIFO-SO比其他IFO算法更有效地从视频演示中学习，有时甚至可以实现与观察（Gaifo）算法的生成对抗性模仿（Gaifo）算法的性能，这些算法有特权访问演示者的前提性状态信息。

The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone. However, current state-of-the-art approaches developed for this problem exhibit high sample complexity due, in part, to the high-dimensional nature of video observations. Towards addressing this issue, we introduce here a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO. At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer, which provides estimates of lower-dimensional proprioceptive state representations from high-dimensional images. We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations and can sometimes even achieve performance close to the Generative Adversarial Imitation from Observation (GAIfO) algorithm that has privileged access to the demonstrator's proprioceptive state information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题