论文标题
研究基于深扬声器嵌入的验证系统的不同校准方法的研究
Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems
论文作者
论文摘要
深扬声器嵌入提取器已经成为扬声器验证领域中最新的最新系统。但是,此类系统的验证评分校准问题通常仍然不合时宜。不相关的分数校准会导致严重的问题,尤其是在未知的声学条件下,即使我们在不含阈值的指标方面使用了强的扬声器验证系统。本文对几种得分校准方法进行了调查:基于逻辑回归模型的经典方法;最近呈现的幅度估计网络磁铁,该网络使用了训练有素的深扬声器提取器的合并层的激活以及基于单独的规模和偏移预测神经网络的这种方法的概括。这项研究的另一个重点是估计得分归一化对系统校准性能的影响。获得的结果表明,如果使用内域开发数据进行校准调整,则没有严重的问题。否则,就会出现良好的校准性能与无阈值的系统质量之间的权衡。在大多数情况下,使用自适应S-NORM有助于稳定得分分布并提高系统性能。同时,一些实验表明,新方法在几个数据集上的得分稳定中具有限制。
Deep speaker embedding extractors have already become new state-of-the-art systems in the speaker verification field. However, the problem of verification score calibration for such systems often remains out of focus. An irrelevant score calibration leads to serious issues, especially in the case of unknown acoustic conditions, even if we use a strong speaker verification system in terms of threshold-free metrics. This paper presents an investigation over several methods of score calibration: a classical approach based on the logistic regression model; the recently presented magnitude estimation network MagnetO that uses activations from the pooling layer of the trained deep speaker extractor and generalization of such approach based on separate scale and offset prediction neural networks. An additional focus of this research is to estimate the impact of score normalization on the calibration performance of the system. The obtained results demonstrate that there are no serious problems if in-domain development data are used for calibration tuning. Otherwise, a trade-off between good calibration performance and threshold-free system quality arises. In most cases using adaptive s-norm helps to stabilize score distributions and to improve system performance. Meanwhile, some experiments demonstrate that novel approaches have their limits in score stabilization on several datasets.