带有跨门并行CNN的新的语音功能融合方法，用于扬声器识别

论文标题

带有跨门并行CNN的新的语音功能融合方法，用于扬声器识别

A new Speech Feature Fusion method with cross gate parallel CNN for Speaker Recognition

论文作者

Zhang, Jiacheng, Yan, Wenyi, Zhang, Ye

论文摘要

在本文中，根据跨门平行卷积神经网络（CG-PCNN），提出了一种新的语音特征融合方法。几个MEL Filter Bank可以从扬声器语音的每个语音框架中提取不同频率分辨率的MEL Filter Bank特征（MFBF），其中MEL滤波器库中的三角形过滤器的数量不同。由于这些MFBF的频率分辨率不同，因此这些MFBF有一些互补。 CG-PCNN可用于从这些MFBF中提取深层特征，该MFBF采用了横栅机制来捕获改善说话者识别系统性能的互补。然后，可以通过将这些深度特征与扬声器识别连接来获得融合功能。实验结果表明，使用拟议的语音特征融合方法的说话者识别系统有效，并且略优于现有的最新系统。

In this paper, a new speech feature fusion method is proposed for speaker recognition on the basis of the cross gate parallel convolutional neural network (CG-PCNN). The Mel filter bank features (MFBFs) of different frequency resolutions can be extracted from each speech frame of a speaker's speech by several Mel filter banks, where the numbers of the triangular filters in the Mel filter banks are different. Due to the frequency resolutions of these MFBFs are different, there are some complementaries for these MFBFs. The CG-PCNN is utilized to extract the deep features from these MFBFs, which applies a cross gate mechanism to capture the complementaries for improving the performance of the speaker recognition system. Then, the fusion feature can be obtained by concatenating these deep features for speaker recognition. The experimental results show that the speaker recognition system with the proposed speech feature fusion method is effective, and marginally outperforms the existing state-of-the-art systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题