论文标题
通过以任务为导向的对话系统导航连接的记忆
Navigating Connected Memories with a Task-oriented Dialog System
论文作者
论文摘要
近年来,由于智能手机和智能眼镜的出现,用户捕获的个人媒体量的趋势越来越大,从而产生了大型媒体收藏。尽管对话是直观的人类计算机界面,但当前的努力主要集中在基于单语言的媒体检索上,以帮助用户查询其媒体并重新播放他们的记忆。这严重限制了搜索功能,因为用户既不能不首先制定单转质查询,否则既不能询问后续查询,也不能获取信息。 在这项工作中,我们提出了有关连接记忆的对话框,作为一种强大的工具,以授权用户通过多转,交互式对话搜索其媒体收集。为此,我们收集了一个新的面向任务的对话框数据集彗星,其中包含$ 11.5K $ user <->助手对话框(总计$ 103K $ usterances),该对话基于模拟的个人记忆图。我们采用了一种资源效率的两阶段数据收集管道,该管道使用:(1)一个新颖的多模式对话框模拟器,该模拟器生成基于内存图的合成对话框流,以及(2)手动释义以获取自然语言话语。我们分析彗星,制定四个主要任务以基准有意义的进度,并采用最先进的语言模型作为强大的基准,以突出我们数据集捕获的多模式挑战。
Recent years have seen an increasing trend in the volume of personal media captured by users, thanks to the advent of smartphones and smart glasses, resulting in large media collections. Despite conversation being an intuitive human-computer interface, current efforts focus mostly on single-shot natural language based media retrieval to aid users query their media and re-live their memories. This severely limits the search functionality as users can neither ask follow-up queries nor obtain information without first formulating a single-turn query. In this work, we propose dialogs for connected memories as a powerful tool to empower users to search their media collection through a multi-turn, interactive conversation. Towards this, we collect a new task-oriented dialog dataset COMET, which contains $11.5k$ user<->assistant dialogs (totaling $103k$ utterances), grounded in simulated personal memory graphs. We employ a resource-efficient, two-phase data collection pipeline that uses: (1) a novel multimodal dialog simulator that generates synthetic dialog flows grounded in memory graphs, and, (2) manual paraphrasing to obtain natural language utterances. We analyze COMET, formulate four main tasks to benchmark meaningful progress, and adopt state-of-the-art language models as strong baselines, in order to highlight the multimodal challenges captured by our dataset.