基于卷积的渠道频率注意与文本无关的说话者验证

论文标题

基于卷积的渠道频率注意与文本无关的说话者验证

Convolution-Based Channel-Frequency Attention for Text-Independent Speaker Verification

论文作者

Li, Jingyu, Tian, Yusheng, Lee, Tan

论文摘要

深度卷积神经网络（CNN）已应用于提取说话者验证方面取得巨大成功的说话者嵌入。结合注意机制已显示可有效改善模型性能。本文提出了一个有效的基于二维卷积的注意模块，即C2D-ATT。卷积通道和频率之间的相互作用与轻量卷积层的注意计算有关。这仅需要少数参数。产生细粒度的注意力重量来表示通道和频率特定信息。权重施加在输入功能上，以提高扬声器建模的表示能力。 C2D-ATT集成到用于嵌入提取的扬声器的Resnet的修改版本中。实验是在Voxceleb数据集上进行的。结果表明，C2DATT有效地产生了歧视性注意图，并优于其他注意力方法。提出的模型显示出稳健的性能，并具有不同的模型大小和最新结果的规模。

Deep convolutional neural networks (CNNs) have been applied to extracting speaker embeddings with significant success in speaker verification. Incorporating the attention mechanism has shown to be effective in improving the model performance. This paper presents an efficient two-dimensional convolution-based attention module, namely C2D-Att. The interaction between the convolution channel and frequency is involved in the attention calculation by lightweight convolution layers. This requires only a small number of parameters. Fine-grained attention weights are produced to represent channel and frequency-specific information. The weights are imposed on the input features to improve the representation ability for speaker modeling. The C2D-Att is integrated into a modified version of ResNet for speaker embedding extraction. Experiments are conducted on VoxCeleb datasets. The results show that C2DAtt is effective in generating discriminative attention maps and outperforms other attention methods. The proposed model shows robust performance with different scales of model size and achieves state-of-the-art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题