论文标题
对抗性模仿使用州观察员从视频中学习
Adversarial Imitation Learning from Video using a State Observer
论文作者
论文摘要
模仿学习研究社区最近取得了重大进展,目的是使人造代理仅通过视频演示模仿行为。但是,由于视频观察的高维质,因此针对此问题开发的当前最新方法表现出很高的样本复杂性。为了解决这个问题,我们在这里介绍了一种新的算法,称为使用状态观察者VGAIFO-SO从观察中获得的,称为视觉生成对抗性模仿。 Vgaifo-so的核心试图使用一种新颖的,自我监管的状态观察者来解决样本效率低下,该观察者从高维图像中提供了较低维度的前置状态表示的估计。我们在几个连续的控制环境中进行了实验表明,VGAIFO-SO比其他IFO算法更有效地从视频演示中学习,有时甚至可以实现与观察(Gaifo)算法的生成对抗性模仿(Gaifo)算法的性能,这些算法有特权访问演示者的前提性状态信息。
The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone. However, current state-of-the-art approaches developed for this problem exhibit high sample complexity due, in part, to the high-dimensional nature of video observations. Towards addressing this issue, we introduce here a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO. At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer, which provides estimates of lower-dimensional proprioceptive state representations from high-dimensional images. We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations and can sometimes even achieve performance close to the Generative Adversarial Imitation from Observation (GAIfO) algorithm that has privileged access to the demonstrator's proprioceptive state information.