使用Icosahedral CNN对声源的到达估算方向

论文标题

使用Icosahedral CNN对声源的到达估算方向

Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs

论文作者

Diaz-Guerra, David, Miguel, Antonio, Beltran, Jose R.

论文摘要

在本文中，我们提出了一个新的模型，该模型是基于根据麦克风阵列收到的信号计算出的SRP-PHAT功率图应用的二十面体卷积神经网络（CNN）的声源估算的新模型。该二十面体CNN等效于二十面体的60个旋转对称性，这代表了球形旋转的连续空间的良好近似，并且可以使用标准2D卷积层实现，比大多数球形CNN的计算成本低。此外，我们提出了一个新的软弧量函数，而不是在二十面体卷积之后使用完全连接的层，可以将其视为Argmax功能的可区分版本，并允许我们将DOA估计作为回归问题解释，将卷积层的输出解释为概率分布。我们证明，使用适合问题的模型的模型使我们能够以较低的计算成本和更健壮性的其他最先进的模型，从而获得均方根定位误差，即使在具有回响时间$ t_ {60} $ 1.5 s的情况下，也获得了均等的平方定位误差。

In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10° even in scenarios with a reverberation time $T_{60}$ of 1.5 s.

下载PDF全文

下载文献需遵守相关版权规定

论文标题