空中视力和拨号仪导航

论文标题

空中视力和拨号仪导航

Aerial Vision-and-Dialog Navigation

论文作者

Fan, Yue, Chen, Winson, Jiang, Tongzhou, Zhou, Chun, Zhang, Yi, Wang, Xin Eric

论文摘要

与人类交谈并遵循自然语言命令的能力对于智能无人驾驶汽车（又称无人机）至关重要。它可以减轻人们一直持有控制器的负担，允许多任务处理，并使残疾人或双手占用更容易获得无人机控制。为此，我们介绍了空中视觉和拨号导航（AVDN），以通过自然语言对话导航无人机。我们构建了一个具有连续的影子环境的无人机模拟器，并收集了一个超过3K记录的导航轨迹的新的AVDN数据集，并在指挥官和关注者之间使用异步人类对话。指挥官通过请求提供初始导航指令和进一步的指导，而追随者则在模拟器中导航无人机并在需要时提出问题。在数据收集过程中，还记录了追随者对无人机视觉观察的关注。基于AVDN数据集，我们从（完整）对话历史记录中研究航空导航的任务，并提出有效的人类注意力辅助变压器模型（HAA-TRANSSFORMER），该模型学会了预测导航路点和人类注意力。

The ability to converse with humans and follow natural language commands is crucial for intelligent unmanned aerial vehicles (a.k.a. drones). It can relieve people's burden of holding a controller all the time, allow multitasking, and make drone control more accessible for people with disabilities or with their hands occupied. To this end, we introduce Aerial Vision-and-Dialog Navigation (AVDN), to navigate a drone via natural language conversation. We build a drone simulator with a continuous photorealistic environment and collect a new AVDN dataset of over 3k recorded navigation trajectories with asynchronous human-human dialogs between commanders and followers. The commander provides initial navigation instruction and further guidance by request, while the follower navigates the drone in the simulator and asks questions when needed. During data collection, followers' attention on the drone's visual observation is also recorded. Based on the AVDN dataset, we study the tasks of aerial navigation from (full) dialog history and propose an effective Human Attention Aided Transformer model (HAA-Transformer), which learns to predict both navigation waypoints and human attention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题