分层切成薄片的瓦斯坦距离

论文标题

分层切成薄片的瓦斯坦距离

Hierarchical Sliced Wasserstein Distance

论文作者

Nguyen, Khai, Ren, Tongzheng, Nguyen, Huy, Rout, Litu, Nguyen, Tan, Ho, Nhat

论文摘要

切成薄片的Wasserstein（SW）距离已在不同的应用程序场景中广泛使用，因为它可以缩放到大量的支撑量，而不会受到维数的诅咒。切成薄片的瓦斯坦距离的值是通过radon变换（RT）获得的原始度量的一维表示（投影）之间运输成本的平均值。尽管估计切成薄片的瓦斯坦族的支持效率，但在高维度中仍需要相对较大的预测。因此，对于与维度相比，支持数量相对较小的应用程序，例如，使用小批量方法的几个深度学习应用，radon变换的矩阵乘法中的复杂性成为主要的计算瓶颈。为了解决这个问题，我们建议通过线性和随机组合少量的预测来得出预测，这些预测被称为瓶颈预测。我们通过引入层次ra transform（HRT）来解释这些投影的用法，该层ra（HRT）是通过递归应用radon变换变体构建的。然后，我们将方法制定为措施之间的新指标，该指标命名为分层切片瓦斯坦（HSW）距离。通过证明HRT的注入性，我们得出了HSW的指标。此外，我们研究了HSW的理论特性，包括其与SW变体的联系及其计算和样本复杂性。最后，我们将HSW的计算成本和生成质量与常规SW进行比较，使用包括CIFAR10，Celeba和Tiny Imagenet在内的各种基准数据集进行深层生成建模的任务。

Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite its efficiency in the number of supports, estimating the sliced Wasserstein requires a relatively large number of projections in high-dimensional settings. Therefore, for applications where the number of supports is relatively small compared with the dimension, e.g., several deep learning applications where the mini-batch approaches are utilized, the complexities from matrix multiplication of Radon Transform become the main computational bottleneck. To address this issue, we propose to derive projections by linearly and randomly combining a smaller number of projections which are named bottleneck projections. We explain the usage of these projections by introducing Hierarchical Radon Transform (HRT) which is constructed by applying Radon Transform variants recursively. We then formulate the approach into a new metric between measures, named Hierarchical Sliced Wasserstein (HSW) distance. By proving the injectivity of HRT, we derive the metricity of HSW. Moreover, we investigate the theoretical properties of HSW including its connection to SW variants and its computational and sample complexities. Finally, we compare the computational cost and generative quality of HSW with the conventional SW on the task of deep generative modeling using various benchmark datasets including CIFAR10, CelebA, and Tiny ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题