Palm Up：在潜在的歧管中玩无监督的预读

论文标题

Palm Up：在潜在的歧管中玩无监督的预读

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

论文作者

Liu, Hao, Zahavy, Tom, Mnih, Volodymyr, Singh, Satinder

论文摘要

大而多样化的数据集是人工智能中许多令人印象深刻进步的基石。但是，智能生物通过与环境互动来学习，从而改变了输入感官信号和环境状态。在这项工作中，我们旨在带来两全其美的最好的，并提出一种算法，该算法表现出探索性行为，同时它利用了大量的不同数据集。我们的关键思想是利用在静态数据集中鉴定的深层生成模型，并在潜在空间中引入动态模型。过渡动力学简单地将动作和随机采样潜在混合。然后，它应用了时间持久性的指数移动平均值，使用验证的发电机将所得的潜在被解码为图像。然后，我们采用无监督的强化学习算法在这种环境中探索，并对收集的数据进行无监督的表示学习。我们进一步利用此数据的时间信息将数据点配对，以作为代表学习的自然监督。我们的实验表明，在视觉和强化学习领域中，学习的表示形式可以成功地转移到下游任务。

Large and diverse datasets have been the cornerstones of many impressive advancements in artificial intelligence. Intelligent creatures, however, learn by interacting with the environment, which changes the input sensory signals and the state of the environment. In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes large diverse datasets. Our key idea is to leverage deep generative models that are pretrained on static datasets and introduce a dynamic model in the latent space. The transition dynamics simply mixes an action and a random sampled latent. It then applies an exponential moving average for temporal persistency, the resulting latent is decoded to image using pretrained generator. We then employ an unsupervised reinforcement learning algorithm to explore in this environment and perform unsupervised representation learning on the collected data. We further leverage the temporal information of this data to pair data points as a natural supervision for representation learning. Our experiments suggest that the learned representations can be successfully transferred to downstream tasks in both vision and reinforcement learning domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题