论文标题
通过多阶段建模改善分离的表示学习者的重建
Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling
论文作者
论文摘要
当前的基于自动编码器的分解表示方法学习方法通过对(总)后部进行惩罚以鼓励潜在因素的统计独立性来实现分离。这种方法引入了分解表示学习与重建质量之间的权衡,因为该模型没有足够的能力来学习相关的潜在变量,从而捕获大多数图像数据中存在的详细信息。为了克服这一权衡,我们提出了一种新型的多阶段建模方法,在该方法中首先使用基于惩罚的分解表示方法学习分离的因素;然后,通过另一种深层生成模型改进了低质量的重建,该模型经过训练,该模型对缺失的相关潜在变量进行了建模,从而添加了详细信息,同时维持对先前学到的脱节因素的调节。 Taken together, our multi-stage modelling approach results in a single, coherent probabilistic model that is theoretically justified by the principal of D-separation and can be realized with a variety of model classes including likelihood-based models such as variational autoencoders, implicit models such as generative adversarial networks, and tractable models like normalizing flows or mixtures of Gaussians.我们证明,我们的多阶段模型具有比当前最新方法更高的重建质量,这些方法在多个标准基准测试中具有等效的分离性能。此外,我们应用多阶段模型来生成合成表格数据集,从而在各种指标上展示了基准模型的增强性能。可解释性分析进一步表明,多阶段模型可以有效地发现可以从中恢复原始分布的不同变化特征。
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture detail information present in most image data. To overcome this trade-off, we present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method; then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables, adding detail information while maintaining conditioning on the previously learned disentangled factors. Taken together, our multi-stage modelling approach results in a single, coherent probabilistic model that is theoretically justified by the principal of D-separation and can be realized with a variety of model classes including likelihood-based models such as variational autoencoders, implicit models such as generative adversarial networks, and tractable models like normalizing flows or mixtures of Gaussians. We demonstrate that our multi-stage model has higher reconstruction quality than current state-of-the-art methods with equivalent disentanglement performance across multiple standard benchmarks. In addition, we apply the multi-stage model to generate synthetic tabular datasets, showcasing an enhanced performance over benchmark models across a variety of metrics. The interpretability analysis further indicates that the multi-stage model can effectively uncover distinct and meaningful features of variations from which the original distribution can be recovered.