使用Ambisonics功能的CRNN在回响房间计数的高分辨率扬声器

论文标题

使用Ambisonics功能的CRNN在回响房间计数的高分辨率扬声器

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

论文作者

Grumiaux, Pierre-Amaury, Kitic, Srdjan, Girin, Laurent, Guérin, Alexandre

论文摘要

说话者计数是估计在录音中同时讲话的人数的任务。对于多个音频处理任务，例如说话者诊断，分离，本地化和跟踪，知道每个时间步中的扬声器数量是先决条件，或者至少可以是一个强大的优势，除了启用低潜伏期处理外。为此，我们通过多通道卷积复发性神经网络解决了说话者计数问题，该神经网络在短期框架分辨率下产生估计。我们培训了网络，以预测多通道混合物中多达5个并发扬声器，并具有模拟数据，包括许多不同的条件，在源和麦克风位置，混响和噪声方面。网络可以在框架分辨率时以良好的精度预测扬声器的数量。

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. For that purpose, we address the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. We trained the network to predict up to 5 concurrent speakers in a multichannel mixture, with simulated data including many different conditions in terms of source and microphone positions, reverberation, and noise. The network can predict the number of speakers with good accuracy at frame resolution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题