论文标题

扬声器分离的个性化条件和负距离

Individualized Conditioning and Negative Distances for Speaker Separation

论文作者

Sun, Tao, Abuhajar, Nidal, Gong, Shuyu, Wang, Zhewei, Smith, Charles D., Wang, Xianhui, Xu, Li, Liu, Jundong

论文摘要

扬声器分离旨在从混合信号中提取多种声音。在本文中,我们提出了两种说话者感知的设计,以改善现有的扬声器分离解决方案。第一个模型是一个扬声器调节网络,该网络集成了语音样本以生成个性化的扬声器条件,然后为分离模块提供了明智的指导,以产生分离良好的输出。 第二种设计旨在减少分离的语音中的非目标声音。为此,我们提出了负距离,以惩罚通道输出中任何非目标语音的外观,并为了使分离的声音更接近干净的目标。我们探索两个不同的设置,即加权和三重态,以整合这两个距离,以形成分离网络的组合辅助损失。在Librimix上进行的实验证明了我们提出的模型的有效性。

Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源