论文标题

灵长类动物分类和共vid检测的视觉变压器

Visual Transformers for Primates Classification and Covid Detection

论文作者

Illium, Steffen, Müller, Robert, Sedlmeier, Andreas, Popien, Claudia-Linnhoff

论文摘要

我们将视觉变压器应用于围绕注意机制的深度机制模型,将其应用于原始音频录音的MEL-SPECTROGRAM表示。在添加基于MEL的数据增强技术和样本加权时,我们在比较21的任务(PRS和CCS挑战)任务上实现了可比的性能,表现优于大多数单个模型基线。我们进一步介绍重叠的垂直补丁并评估参数配置的影响。索引术语:音频分类,注意力,MEL光谱图,不平衡数据集,计算副语言学

We apply the vision transformer, a deep machine learning model build around the attention mechanism, on mel-spectrogram representations of raw audio recordings. When adding mel-based data augmentation techniques and sample-weighting, we achieve comparable performance on both (PRS and CCS challenge) tasks of ComParE21, outperforming most single model baselines. We further introduce overlapping vertical patching and evaluate the influence of parameter configurations. Index Terms: audio classification, attention, mel-spectrogram, unbalanced data-sets, computational paralinguistics

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源