论文标题
扬声器分离的个性化条件和负距离
Individualized Conditioning and Negative Distances for Speaker Separation
论文作者
论文摘要
扬声器分离旨在从混合信号中提取多种声音。在本文中,我们提出了两种说话者感知的设计,以改善现有的扬声器分离解决方案。第一个模型是一个扬声器调节网络,该网络集成了语音样本以生成个性化的扬声器条件,然后为分离模块提供了明智的指导,以产生分离良好的输出。 第二种设计旨在减少分离的语音中的非目标声音。为此,我们提出了负距离,以惩罚通道输出中任何非目标语音的外观,并为了使分离的声音更接近干净的目标。我们探索两个不同的设置,即加权和三重态,以整合这两个距离,以形成分离网络的组合辅助损失。在Librimix上进行的实验证明了我们提出的模型的有效性。
Speaker separation aims to extract multiple voices from a mixed signal. In this paper, we propose two speaker-aware designs to improve the existing speaker separation solutions. The first model is a speaker conditioning network that integrates speech samples to generate individualized speaker conditions, which then provide informed guidance for a separation module to produce well-separated outputs. The second design aims to reduce non-target voices in the separated speech. To this end, we propose negative distances to penalize the appearance of any non-target voice in the channel outputs, and positive distances to drive the separated voices closer to the clean targets. We explore two different setups, weighted-sum and triplet-like, to integrate these two distances to form a combined auxiliary loss for the separation networks. Experiments conducted on LibriMix demonstrate the effectiveness of our proposed models.