灵长类动物分类和共vid检测的视觉变压器

论文标题

灵长类动物分类和共vid检测的视觉变压器

Visual Transformers for Primates Classification and Covid Detection

论文作者

Illium, Steffen, Müller, Robert, Sedlmeier, Andreas, Popien, Claudia-Linnhoff

论文摘要

我们将视觉变压器应用于围绕注意机制的深度机制模型，将其应用于原始音频录音的MEL-SPECTROGRAM表示。在添加基于MEL的数据增强技术和样本加权时，我们在比较21的任务（PRS和CCS挑战）任务上实现了可比的性能，表现优于大多数单个模型基线。我们进一步介绍重叠的垂直补丁并评估参数配置的影响。索引术语：音频分类，注意力，MEL光谱图，不平衡数据集，计算副语言学

We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics

下载PDF全文

下载文献需遵守相关版权规定

论文标题