论文标题

基于注意力的缩放缩放适应目标语音提取

Attention-based scaling adaptation for target speech extraction

论文作者

Han, Jiangyu, Rao, Wei, Long, Yanhua, Liang, Jiaen

论文摘要

近年来,目标语音提取引起了广泛的关注。在这项工作中,我们专注于研究不同混合物与目标扬声器之间的动态相互作用,以利用判别目标扬声器线索。我们提出了一种特殊的注意机制,而无需在缩放适应层中引入任何其他参数,以更好地适应网络来提取目标语音。此外,通过引入嵌入矩阵合并方法的混合物,我们提出的基于注意力的缩放适应(ASA)可以以更有效的方式利用目标扬声器线索。空间化的回响WSJ0 2-MIX数据集的实验结果表明,所提出的方法可以有效地改善目标语音提取的性能。此外,我们发现在同一网络配置下,在单渠道条件下的ASA可以实现具有竞争性能提高,因为它从具有微米间相位差异(IPD)特征的两通道混合物中获得的ASA。

The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism without introducing any additional parameters in a scaling adaptation layer to better adapt the network towards extracting the target speech. Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way. Experimental results on the spatialized reverberant WSJ0 2-mix dataset demonstrate that the proposed method can improve the performance of the target speech extraction effectively. Furthermore, we find that under the same network configurations, the ASA in a single-channel condition can achieve competitive performance gains as that achieved from two-channel mixtures with inter-microphone phase difference (IPD) features.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源