观察家网络：无监督代表学习的学习观点

论文标题

观察家网络：无监督代表学习的学习观点

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

论文作者

Tamkin, Alex, Wu, Mike, Goodman, Noah

论文摘要

无监督的表示训练模型的许多最新方法是输入的不同“视图”或扭曲版本的不变性。但是，设计这些观点需要人类专家的大量反复试验，从而阻碍了跨领域和模式的无监督代表性学习方法的广泛采用。为了解决这个问题，我们提出了ViewMaker Networks：学会从给定输入中产生有用视图的生成模型。 ViewMakers是随机有限的对手：它们通过生成并在输入中添加$ \ ell_p $ bund的扰动来产生视图，并且相对于主编码器网络，对对手进行了训练。值得注意的是，当我们在CIFAR-10上进行审计时，我们的学到的观点可以使其可比的转移精度与精心调整的SIMCLR增强功能 - 尽管不包括裁剪或颜色抖动之类的转换。此外，我们学到的观点在语音记录上的表现明显优于基线增强（平均+9％）和可穿戴传感器数据（+17％点）。观看者也可以与手工看法相结合：它们改善了对常见图像损坏的鲁棒性，并在探索较少探索的情况下可以提高转移性能。这些结果表明，ViewMaker可能会为更一般的表示学习算法提供一条途径 - 减少在更广泛的域名上预处理所需的领域专业知识和努力。代码可从https://github.com/alextamkin/viewmaker获得。

Many recent methods for unsupervised representation learning train models to be invariant to different "views," or distorted versions of an input. However, designing these views requires considerable trial and error by human experts, hindering widespread adoption of unsupervised representation learning methods across domains and modalities. To address this, we propose viewmaker networks: generative models that learn to produce useful views from a given input. Viewmakers are stochastic bounded adversaries: they produce views by generating and then adding an $\ell_p$-bounded perturbation to the input, and are trained adversarially with respect to the main encoder network. Remarkably, when pretraining on CIFAR-10, our learned views enable comparable transfer accuracy to the well-tuned SimCLR augmentations -- despite not including transformations like cropping or color jitter. Furthermore, our learned views significantly outperform baseline augmentations on speech recordings (+9% points, on average) and wearable sensor data (+17% points). Viewmakers can also be combined with handcrafted views: they improve robustness to common image corruptions and can increase transfer performance in cases where handcrafted views are less explored. These results suggest that viewmakers may provide a path towards more general representation learning algorithms -- reducing the domain expertise and effort needed to pretrain on a much wider set of domains. Code is available at https://github.com/alextamkin/viewmaker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题