基准测试多模式变异自动编码器：CDSPRITES+数据集和工具包

论文标题

基准测试多模式变异自动编码器：CDSPRITES+数据集和工具包

Benchmarking Multimodal Variational Autoencoders: CdSprites+ Dataset and Toolkit

论文作者

Sejnova, Gabriela, Vavrecka, Michal, Stepanova, Karla, Taniguchi, Tadahiro

论文摘要

在过去的几年中，多模式变异自动编码器（VAE）一直是激烈研究的主题，因为它们可以将多种模态整合到联合表示中，因此可以作为数据分类和生成的有前途的工具。到目前为止，已经提出了几种多模式VAE学习的方法，但是它们的比较和评估是相当不一致的。原因之一是模型在实现级别上有所不同，另一个问题是，在这些情况下常用的数据集最初不是为评估多模式生成模型的设计。本文解决了这两个问题。首先，我们提出了一个用于系统的多模式VAE训练和比较的工具包。该工具包目前包括4个现有的多模式VAE和6个常用的基准数据集以及有关如何轻松添加新模型或数据集的说明。其次，我们提出了一个分离的双峰数据集，旨在在多个难度级别上全面评估联合发电和交叉生成能力。我们通过比较实施的最新模型来证明数据集的实用性。

Multimodal Variational Autoencoders (VAEs) have been the subject of intense research in the past years as they can integrate multiple modalities into a joint representation and can thus serve as a promising tool for both data classification and generation. Several approaches toward multimodal VAE learning have been proposed so far, their comparison and evaluation have however been rather inconsistent. One reason is that the models differ at the implementation level, another problem is that the datasets commonly used in these cases were not initially designed to evaluate multimodal generative models. This paper addresses both mentioned issues. First, we propose a toolkit for systematic multimodal VAE training and comparison. The toolkit currently comprises 4 existing multimodal VAEs and 6 commonly used benchmark datasets along with instructions on how to easily add a new model or a dataset. Second, we present a disentangled bimodal dataset designed to comprehensively evaluate the joint generation and cross-generation capabilities across multiple difficulty levels. We demonstrate the utility of our dataset by comparing the implemented state-of-the-art models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题