论文标题
CNSRC2022的Speakin系统描述
The SpeakIn System Description for CNSRC2022
论文作者
论文摘要
该报告描述了我们的发言人验证系统,用于CN-CELEB发言人识别挑战2022(CNSRC 2022)的任务。这项挑战包括两个任务,即说话者验证(SV)和说话者检索(SR)。 SV任务涉及两个轨道:固定轨道和开放轨道。在固定轨道中,我们仅使用CN-CELEB.T作为训练集。对于SV任务和SR任务的开放轨道,我们添加了开源音频数据。为此挑战开发了基于重新连接的,基于REPVGG和基于TDNN的架构。全局统计池结构和MQMHA池结构用于跨时间汇总框架级特征,以获得说服级表示。我们采用了Am-Softmax和Aam-Softmax与子中心方法相结合,以对所得嵌入进行分类。我们还使用了大规模的微调策略来进一步提高模型性能。在后端,使用了亚均值和亚型。在SV任务固定轨道中,我们的系统是五个型号的融合,并且在SV任务开放轨道中融合了两个型号。我们在SR任务中使用了一个系统。我们的方法带来了出色的性能,并在SV任务的开放式轨道上排名第一,在SV任务的固定轨道中的第二名以及SR任务中的第三名。
This report describes our speaker verification systems for the tasks of the CN-Celeb Speaker Recognition Challenge 2022 (CNSRC 2022). This challenge includes two tasks, namely speaker verification(SV) and speaker retrieval(SR). The SV task involves two tracks: fixed track and open track. In the fixed track, we only used CN-Celeb.T as the training set. For the open track of the SV task and SR task, we added our open-source audio data. The ResNet-based, RepVGG-based, and TDNN-based architectures were developed for this challenge. Global statistic pooling structure and MQMHA pooling structure were used to aggregate the frame-level features across time to obtain utterance-level representation. We adopted AM-Softmax and AAM-Softmax combined with the Sub-Center method to classify the resulting embeddings. We also used the Large-Margin Fine-Tuning strategy to further improve the model performance. In the backend, Sub-Mean and AS-Norm were used. In the SV task fixed track, our system was a fusion of five models, and two models were fused in the SV task open track. And we used a single system in the SR task. Our approach leads to superior performance and comes the 1st place in the open track of the SV task, the 2nd place in the fixed track of the SV task, and the 3rd place in the SR task.