萨摩：扬声器吸引者多中心一级学习语音反欺骗

论文标题

萨摩：扬声器吸引者多中心一级学习语音反欺骗

SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

论文作者

Ding, Siwen, Zhang, You, Duan, Zhiyao

论文摘要

语音反散布系统是自动扬声器验证（ASV）系统的关键辅助设备。一个主要的挑战是由高级语音合成技术赋予的未见攻击引起的。我们先前对单级学习的研究通过压实嵌入空间中的真正的语音来提高了看不见攻击的概括能力。但是，这种紧凑性缺乏对说话者多样性的考虑。在这项工作中，我们提出了扬声器吸引人的多中心一流学习（SAMO），该学习将围绕许多扬声器吸引者的善意演讲，并在高维嵌入空间中驱散所有吸引者的欺骗攻击。在培训中，我们提出了一种算法，用于对真正的语音聚类和真正的语言/欺骗分类的竞选。为了推论，我们提出了策略，以便在没有入学的情况下为演讲者进行反欺骗。我们提出的系统在ASVSPOOF2019 LA评估集上的相同错误率（EER）相对提高了38％的相对提高现有的最新单个系统。

Voice anti-spoofing systems are crucial auxiliaries for automatic speaker verification (ASV) systems. A major challenge is caused by unseen attacks empowered by advanced speech synthesis technologies. Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space. However, such compactness lacks consideration of the diversity of speakers. In this work, we propose speaker attractor multi-center one-class learning (SAMO), which clusters bona fide speech around a number of speaker attractors and pushes away spoofing attacks from all the attractors in a high-dimensional embedding space. For training, we propose an algorithm for the co-optimization of bona fide speech clustering and bona fide/spoof classification. For inference, we propose strategies to enable anti-spoofing for speakers without enrollment. Our proposed system outperforms existing state-of-the-art single systems with a relative improvement of 38% on equal error rate (EER) on the ASVspoof2019 LA evaluation set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题