论文标题
迈向深层生成模型的概括度量
Toward a Generalization Metric for Deep Generative Models
论文作者
论文摘要
由于维度的诅咒,测量深生成模型(DGM)的概括能力很困难。 DGM的评估指标,例如Inception评分,Fréchet成立距离,Precision-Recall和神经净差异试图使用多项式数量的样本来估计所产生的分布与目标分布之间的距离。这些指标是设计新模型时研究人员的目标。尽管提出了主张,但仍不清楚它们能如何衡量生成模型的概括能力。在本文中,我们研究了这些指标在测量概括能力方面的能力。我们介绍了一个框架,以比较评估指标的鲁棒性。我们表明,这些指标的得分更好并不意味着更好的概括。他们可以很容易地被记住训练集的一小部分的发电机欺骗。我们向NND指标提出了修复程序,以使其对生成数据中的噪声更强大。为了构建一个可靠的概括度量,我们建议将最小描述长度原理应用于评估DGM的问题。我们开发了一种有效的方法来估计生成潜在变量模型(GLVM)的复杂性。实验结果表明,我们的指标可以有效地检测训练集的记忆并区分不同泛化能力的GLVM。源代码可从https://github.com/htt210/generalizationmetricgan获得。
Measuring the generalization capacity of Deep Generative Models (DGMs) is difficult because of the curse of dimensionality. Evaluation metrics for DGMs such as Inception Score, Fréchet Inception Distance, Precision-Recall, and Neural Net Divergence try to estimate the distance between the generated distribution and the target distribution using a polynomial number of samples. These metrics are the target of researchers when designing new models. Despite the claims, it is still unclear how well can they measure the generalization capacity of a generative model. In this paper, we investigate the capacity of these metrics in measuring the generalization capacity. We introduce a framework for comparing the robustness of evaluation metrics. We show that better scores in these metrics do not imply better generalization. They can be fooled easily by a generator that memorizes a small subset of the training set. We propose a fix to the NND metric to make it more robust to noise in the generated data. Toward building a robust metric for generalization, we propose to apply the Minimum Description Length principle to the problem of evaluating DGMs. We develop an efficient method for estimating the complexity of Generative Latent Variable Models (GLVMs). Experimental results show that our metric can effectively detect training set memorization and distinguish GLVMs of different generalization capacities. Source code is available at https://github.com/htt210/GeneralizationMetricGAN.