复杂值的时间频自我注意语音覆盖

论文标题

复杂值的时间频自我注意语音覆盖

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

论文作者

Kothapally, Vinay, Hansen, John H. L.

论文摘要

当深层复杂的神经网络（DCNN）与自我注意力（SA）网络相结合时，几种语音处理系统已显示出可观的性能改善。然而，大多数基于DCNN的关于使用自我注意的语音脊椎的研究并不能明确地说明计算注意力时的真实特征和虚构特征之间的相互依存关系。在这项研究中，我们提出了一个复杂值的T-F注意（TFA）模块，该模块通过在时间和频率维度上计算二维注意图来对光谱和时间依赖性进行建模。我们使用Reverb挑战语料库验证了我们提出的复合复杂卷积复发网络（DCCRN）的拟议复合物值TFA模块的有效性。实验发现表明，与早期的自我注意力方法相比，将我们的复杂TFA模块与DCCRN相结合可改善后端语音应用的总体语音质量和后端语音应用的性能，例如自动语音识别。

Several speech processing systems have demonstrated considerable performance improvements when deep complex neural networks (DCNN) are coupled with self-attention (SA) networks. However, the majority of DCNN-based studies on speech dereverberation that employ self-attention do not explicitly account for the inter-dependencies between real and imaginary features when computing attention. In this study, we propose a complex-valued T-F attention (TFA) module that models spectral and temporal dependencies by computing two-dimensional attention maps across time and frequency dimensions. We validate the effectiveness of our proposed complex-valued TFA module with the deep complex convolutional recurrent network (DCCRN) using the REVERB challenge corpus. Experimental findings indicate that integrating our complex-TFA module with DCCRN improves overall speech quality and performance of back-end speech applications, such as automatic speech recognition, compared to earlier approaches for self-attention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题