论文标题

使用基于WAV2VEC 2.0功能的aakio-words方法的家庭源发声的复杂序列的可视化序列可视化

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

论文作者

Li, Jialu, Hasegawa-Johnson, Mark, McElwain, Nancy L.

论文摘要

在美国,大约15-17%的2-8岁儿童估计至少有一种被诊断出的精神,行为或发育障碍。但是,这种疾病常常无法诊断,并且在生命的头几年评估和治疗疾病的能力受到限制。为了分析婴儿的发育变化,先前的研究表明,高级ML模型在使用手机,视频或诸如Lena之类的仅声音录制设备收集的婴儿和/或家长发声方面表现出色。在这项研究中,我们试行测试了我们开发的称为LittleBeats(LB)的新婴儿可穿戴多模式设备的音频组件。 LB音频管道的先进是,它为说话者诊断和发声分类任务提供了可靠的标签,与仅记录音频和/或提供扬声器诊断标签的其他平台相比。我们利用WAV2VEC 2.0在LB家族音频流中获得更高和更细微的结果。我们使用具有WAV2VEC 2.0功能的Audio-Words方法来创建高级可视化,以了解家庭侵害的发声互动。我们证明,我们的高质量可视化捕获了标记和未标记的LB音频的主要类别,表明精神,行为和发育健康的类别。

In the U.S., approximately 15-17% of children 2-8 years of age are estimated to have at least one diagnosed mental, behavioral or developmental disorder. However, such disorders often go undiagnosed, and the ability to evaluate and treat disorders in the first years of life is limited. To analyze infant developmental changes, previous studies have shown advanced ML models excel at classifying infant and/or parent vocalizations collected using cell phone, video, or audio-only recording device like LENA. In this study, we pilot test the audio component of a new infant wearable multi-modal device that we have developed called LittleBeats (LB). LB audio pipeline is advanced in that it provides reliable labels for both speaker diarization and vocalization classification tasks, compared with other platforms that only record audio and/or provide speaker diarization labels. We leverage wav2vec 2.0 to obtain superior and more nuanced results with the LB family audio stream. We use a bag-of-audio-words method with wav2vec 2.0 features to create high-level visualizations to understand family-infant vocalization interactions. We demonstrate that our high-quality visualizations capture major types of family vocalization interactions, in categories indicative of mental, behavioral, and developmental health, for both labeled and unlabeled LB audio.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源