使用长期空间连贯性的阵列配置 - 不可思议的个性化语音增强

论文标题

使用长期空间连贯性的阵列配置 - 不可思议的个性化语音增强

Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

论文作者

Hsu, Yicheng, Lee, Yonghan, Bai, Mingsian R.

论文摘要

个性化的语音增强一直是一个积极研究的领域，用于抑制语音般的干扰器，例如竞争演讲者或电视对话。与单个通道方法相比，通过利用麦克风信号中的空间信息，多通道PSE系统可以在不良声条件下更有效。但是，实施多通道PSE来适应家庭应用中的各种阵列拓扑，这可能是具有挑战性的。为了开发数组配置不可知的PSE系统，我们定义了一个空间特征，该空间特征称为长期空间连贯性，是卷积复发网络的输入功能，以监视目标扬声器的语音活动。作为另一个改进，可以使用等效的矩形带宽缩放LSTSC功能来降低计算成本。进行了实验，以比较提出的PSE系统，包括在电视噪音和竞争扬声器的情况下，使用看不见的房间响应和阵列配置，将完整版和简化版本与两个基线进行比较。结果表明，使用LSTSC功能训练的拟议的多通道PSE网络实现了出色的增强性能，而无需精确了解阵列配置和房间响应。

Personalized speech enhancement has been a field of active research for suppression of speechlike interferers such as competing speakers or TV dialogues. Compared with single channel approaches, multichannel PSE systems can be more effective in adverse acoustic conditions by leveraging the spatial information in microphone signals. However, the implementation of multichannel PSEs to accommodate a wide range of array topology in household applications can be challenging. To develop an array configuration agnostic PSE system, we define a spatial feature termed the long short term spatial coherence as the input feature to a convolutional recurrent network to monitor the voice activity of the target speaker. As another refinement, an equivalent rectangular bandwidth scaled LSTSC feature can be used to reduce the computational cost. Experiments were conducted to compare the proposed PSE systems, including the complete and the simplified versions with two baselines using unseen room responses and array configurations in the presence of TV noise and competing speakers. The results demonstrated that the proposed multichannel PSE network trained with the LSTSC feature achieved superior enhancement performance without precise knowledge of the array configurations and room responses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题