论文标题
通过字符级信息保存SDSV挑战2020年的角色级信息保存2020
Robust Text-Dependent Speaker Verification via Character-Level Information Preservation for the SdSV Challenge 2020
论文作者
论文摘要
本文描述了我们对短期扬声器验证(SDSV)挑战2020的任务1的提交。提交的系统由基于TDNN的和基于重新连接的前端体系结构组成,其中框架级特征通过各种汇总方法(例如统计,自我实力,ghostvlad Poling)汇总。尽管常规的合并方法提供了足够数量的说话者依赖性信息的嵌入,但我们的实验表明,这些嵌入通常缺乏依赖短语的信息。为了减轻此问题,我们提出了一种新的合并和评分补偿方法,该方法利用基于CTC的自动语音识别(ASR)模型来考虑词汇内容。两种方法均表现出比传统技术的改进,并且通过融合所有实验系统,在挑战的评估子集上呈现出0.0785%的MindCF和2.23%的EER来实现最佳性能。
This paper describes our submission to Task 1 of the Short-duration Speaker Verification (SdSV) challenge 2020. Task 1 is a text-dependent speaker verification task, where both the speaker and phrase are required to be verified. The submitted systems were composed of TDNN-based and ResNet-based front-end architectures, in which the frame-level features were aggregated with various pooling methods (e.g., statistical, self-attentive, ghostVLAD pooling). Although the conventional pooling methods provide embeddings with a sufficient amount of speaker-dependent information, our experiments show that these embeddings often lack phrase-dependent information. To mitigate this problem, we propose a new pooling and score compensation methods that leverage a CTC-based automatic speech recognition (ASR) model for taking the lexical content into account. Both methods showed improvement over the conventional techniques, and the best performance was achieved by fusing all the experimented systems, which showed 0.0785% MinDCF and 2.23% EER on the challenge's evaluation subset.