论文标题
对属性驱动的隐私保护的说话者代表的对抗性删除
Adversarial Disentanglement of Speaker Representation for Attribute-Driven Privacy Preservation
论文作者
论文摘要
在语音技术中,说话者的语音表示形式用于许多应用程序,例如语音识别,语音转换,语音综合以及显然是用户身份验证。演讲者的现代人声表现基于神经嵌入。除了目标信息外,这些表示形式通常包含有关说话者的敏感信息,例如年龄,性别,身体状态,教育水平或种族。为了允许用户选择要保护的信息,我们在本文中介绍了属于属性驱动的隐私保存的概念,以说话者语音表示。它允许一个人将一个或多个个人方面隐藏给潜在的恶意拦截器和应用程序提供商。作为对此概念的第一个解决方案,我们建议使用一种对抗性自动编码方法,该方法在语音表示中删除给定的扬声器属性,从而允许其隐藏。我们在这里专注于自动扬声器验证(ASV)任务的性属性。使用Voxceleb数据集进行的实验表明,该方法在保留ASV能力时可以隐藏此属性。
In speech technologies, speaker's voice representation is used in many applications such as speech recognition, voice conversion, speech synthesis and, obviously, user authentication. Modern vocal representations of the speaker are based on neural embeddings. In addition to the targeted information, these representations usually contain sensitive information about the speaker, like the age, sex, physical state, education level or ethnicity. In order to allow the user to choose which information to protect, we introduce in this paper the concept of attribute-driven privacy preservation in speaker voice representation. It allows a person to hide one or more personal aspects to a potential malicious interceptor and to the application provider. As a first solution to this concept, we propose to use an adversarial autoencoding method that disentangles in the voice representation a given speaker attribute thus allowing its concealment. We focus here on the sex attribute for an Automatic Speaker Verification (ASV) task. Experiments carried out using the VoxCeleb datasets have shown that the proposed method enables the concealment of this attribute while preserving ASV ability.