对乘客意图的视听理解是对对话的乘客意图

论文标题

对乘客意图的视听理解是对对话的乘客意图

Audio-Visual Understanding of Passenger Intents for In-Cabin Conversational Agents

论文作者

Okur, Eda, Kumar, Shachi H, Sahay, Saurav, Nachman, Lama

论文摘要

构建位于卡比宾环境中的理解能力的多模式对话对于增强自动驾驶汽车（AV）交互系统的乘客舒适性至关重要。为此，了解口语互动和车辆视觉系统的乘客意图是为AV开发上下文和视觉上的对话剂的关键组成部分。为了实现这一目标，我们探索了AMIE（自动车辆多模式内体验），这是负责处理多模式乘客车辆相互作用的卡宾代理。在这项工作中，我们通过将口头/语言输入以及来自车辆内部和外部的非语言/声音和视觉线索结合在一起，讨论了对卡宾内话语的多模式理解的好处。我们的实验结果优于仅文本基线的基线，因为我们通过多模式方法获得了改进的意图检测性能。

Building multimodal dialogue understanding capabilities situated in the in-cabin context is crucial to enhance passenger comfort in autonomous vehicle (AV) interaction systems. To this end, understanding passenger intents from spoken interactions and vehicle vision systems is a crucial component for developing contextual and visually grounded conversational agents for AV. Towards this goal, we explore AMIE (Automated-vehicle Multimodal In-cabin Experience), the in-cabin agent responsible for handling multimodal passenger-vehicle interactions. In this work, we discuss the benefits of a multimodal understanding of in-cabin utterances by incorporating verbal/language input together with the non-verbal/acoustic and visual clues from inside and outside the vehicle. Our experimental results outperformed text-only baselines as we achieved improved performances for intent detection with a multimodal approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题