时空表示学习增强了语音记录的源单元手机识别

论文标题

时空表示学习增强了语音记录的源单元手机识别

Spatio-Temporal Representation Learning Enhanced Source Cell-phone Recognition from Speech Recordings

论文作者

Zeng, Chunyan, Feng, Shixiong, Wang, Zhifeng, Wan, Xiangkui, Chen, Yunfan, Zhao, Nan

论文摘要

现有的源单元手机识别方法缺乏源设备的长期特征表征，从而导致源相关特征的不准确表示，从而导致识别精度不足。在本文中，我们提出了一种基于时空表示学习的源细胞手机识别方法，其中包括两个主要部分：提取顺序高斯平均矩阵特征和基于时空表示学习的识别模型的构建。在特征提取部分中，基于对记录源信号的时间序列表示的分析，我们通过使用高斯混合模型对数据分布的灵敏度提取具有长期和短期表示能力的顺序高斯平均矩阵。 In the model construction part, we design a structured spatio-temporal representation learning network C3D-BiLSTM to fully characterize the spatio-temporal information, combine 3D convolutional network and bidirectional long short-term memory network for short-term spectral information and long-time fluctuation information representation learning, and achieve accurate recognition of cell-phones by fusing spatio-temporal feature information of recording source signals.对于CCNU \ _-Mobile数据集，封闭式识别45个手机的封闭式识别的平均准确度为99.03％，在小型样本量实验中的平均识别率为98.18％，识别性能比现有的最新方法更好。实验结果表明，该方法在多类细胞手机识别中表现出良好的识别性能。

The existing source cell-phone recognition method lacks the long-term feature characterization of the source device, resulting in inaccurate representation of the source cell-phone related features which leads to insufficient recognition accuracy. In this paper, we propose a source cell-phone recognition method based on spatio-temporal representation learning, which includes two main parts: extraction of sequential Gaussian mean matrix features and construction of a recognition model based on spatio-temporal representation learning. In the feature extraction part, based on the analysis of time-series representation of recording source signals, we extract sequential Gaussian mean matrix with long-term and short-term representation ability by using the sensitivity of Gaussian mixture model to data distribution. In the model construction part, we design a structured spatio-temporal representation learning network C3D-BiLSTM to fully characterize the spatio-temporal information, combine 3D convolutional network and bidirectional long short-term memory network for short-term spectral information and long-time fluctuation information representation learning, and achieve accurate recognition of cell-phones by fusing spatio-temporal feature information of recording source signals. The method achieves an average accuracy of 99.03% for the closed-set recognition of 45 cell-phones under the CCNU\_Mobile dataset, and 98.18% in small sample size experiments, with recognition performance better than the existing state-of-the-art methods. The experimental results show that the method exhibits excellent recognition performance in multi-class cell-phones recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题