论文标题
进行情感表达的多模式建模
Toward Multimodal Modeling of Emotional Expressiveness
论文作者
论文摘要
情感表达捕获了一个人倾向于通过行为表现出自己的情绪的程度。由于情感表达与行为健康之间的密切关系以及它在社交互动中所发挥的关键作用,因此自动预测情感表现力的能力刺激了科学,医学和工业的进步。在本文中,我们探讨了三个相关的研究问题。首先,通过视觉,语言和多模式信号可以预测情绪表达方式如何?其次,哪些行为方式对于预测情感表达性至关重要?第三,哪些行为信号与情感表现力可靠地相关?为了回答这些问题,我们在现有的视频数据库中添加了高度可靠的成绩单和人类对感知表达的评级,并使用此数据来训练,验证和测试预测模型。我们的最佳模型显示了该数据集上有希望的预测性能(RMSE = 0.65,r^2 = 0.45,r = 0.74)。多模型模型倾向于表现最佳,并且在语言方式上训练的模型倾向于优于在视觉方式上训练的模型。最后,对我们可解释的模型系数的检查揭示了许多视觉和语言行为信号 - 例如面部动作单元强度,整体单词计数以及与社会过程相关的单词的使用 - 可以可靠地预测情感表现力。
Emotional expressiveness captures the extent to which a person tends to outwardly display their emotions through behavior. Due to the close relationship between emotional expressiveness and behavioral health, as well as the crucial role that it plays in social interaction, the ability to automatically predict emotional expressiveness stands to spur advances in science, medicine, and industry. In this paper, we explore three related research questions. First, how well can emotional expressiveness be predicted from visual, linguistic, and multimodal behavioral signals? Second, which behavioral modalities are uniquely important to the prediction of emotional expressiveness? Third, which behavioral signals are reliably related to emotional expressiveness? To answer these questions, we add highly reliable transcripts and human ratings of perceived emotional expressiveness to an existing video database and use this data to train, validate, and test predictive models. Our best model shows promising predictive performance on this dataset (RMSE=0.65, R^2=0.45, r=0.74). Multimodal models tend to perform best overall, and models trained on the linguistic modality tend to outperform models trained on the visual modality. Finally, examination of our interpretable models' coefficients reveals a number of visual and linguistic behavioral signals--such as facial action unit intensity, overall word count, and use of words related to social processes--that reliably predict emotional expressiveness.