扬声器分离的个性化条件和负距离

论文标题

扬声器分离的个性化条件和负距离

Individualized Conditioning and Negative Distances for Speaker Separation

论文作者

Sun, Tao, Abuhajar, Nidal, Gong, Shuyu, Wang, Zhewei, Smith, Charles D., Wang, Xianhui, Xu, Li, Liu, Jundong

论文摘要

扬声器分离旨在从混合信号中提取多种声音。在本文中，我们提出了两种说话者感知的设计，以改善现有的扬声器分离解决方案。第一个模型是一个扬声器调节网络，该网络集成了语音样本以生成个性化的扬声器条件，然后为分离模块提供了明智的指导，以产生分离良好的输出。第二种设计旨在减少分离的语音中的非目标声音。为此，我们提出了负距离，以惩罚通道输出中任何非目标语音的外观，并为了使分离的声音更接近干净的目标。我们探索两个不同的设置，即加权和三重态，以整合这两个距离，以形成分离网络的组合辅助损失。在Librimix上进行的实验证明了我们提出的模型的有效性。

Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题