基于次要融合的多模式抑郁估计

论文标题

基于次要融合的多模式抑郁估计

Multi-modal Depression Estimation based on Sub-attentional Fusion

论文作者

Wei, Ping-Cheng, Peng, Kunyu, Roitberg, Alina, Yang, Kailun, Zhang, Jiaming, Stiefelhagen, Rainer

论文摘要

未能及时诊断并有效治疗抑郁症会导致全世界有超过2.8亿人患有这种心理障碍。抑郁症的信息提示可以从不同的异质资源（例如音频，视觉和文本数据）中收获，从而提高了对新有效的多模式融合方法的需求，以自动估计。在这项工作中，我们解决了从多模式数据中自动识别抑郁症的任务，并引入了一种链接异质信息的亚注意机制，同时利用卷积双向LSTM作为我们的骨干。为了验证这一想法，我们对公共DAIC-WOZ基准进行了广泛的实验，以进行抑郁评估，该评估具有不同的评估模式，并考虑了特定性别的偏见。提出的模型在检测严重抑郁症和4.92 MAE时以0.89的精度和0.70 F1得分产生有效的结果。我们基于注意力的融合模块始终优于常规的晚期融合方法，并且与先前发表的抑郁估计框架相比，在学习诊断该障碍端到端并依靠更少的预处理步骤相比，我们的竞争性能。

Failure to timely diagnose and effectively treat depression leads to over 280 million people suffering from this psychological disorder worldwide. The information cues of depression can be harvested from diverse heterogeneous resources, e.g., audio, visual, and textual data, raising demand for new effective multi-modal fusion approaches for automatic estimation. In this work, we tackle the task of automatically identifying depression from multi-modal data and introduce a sub-attention mechanism for linking heterogeneous information while leveraging Convolutional Bidirectional LSTM as our backbone. To validate this idea, we conduct extensive experiments on the public DAIC-WOZ benchmark for depression assessment featuring different evaluation modes and taking gender-specific biases into account. The proposed model yields effective results with 0.89 precision and 0.70 F1-score in detecting major depression and 4.92 MAE in estimating the severity. Our attention-based fusion module consistently outperforms conventional late fusion approaches and achieves competitive performance compared to the previously published depression estimation frameworks, while learning to diagnose the disorder end-to-end and relying on far fewer preprocessing steps.

下载PDF全文

下载文献需遵守相关版权规定

论文标题