接地然后导航：在动态场景中进行语言引导的导航

论文标题

接地然后导航：在动态场景中进行语言引导的导航

Ground then Navigate: Language-guided Navigation in Dynamic Scenes

论文作者

Jain, Kanishk, Chhangani, Varun, Tiwari, Amogh, Krishna, K. Madhava, Gandhi, Vineet

论文摘要

我们在室外环境中自动驾驶的背景下研究了视觉和语言导航（VLN）问题。我们通过明确接地与文本命令相对应的可通道区域来解决问题。在每个时间戳，该模型可以预测与中间或最终导航区域相对应的分割掩码。我们的工作与VLN中的现有工作形成鲜明对比，VLN的现有工作将该任务置于节点选择问题，并且给定与环境相对应的离散连接图。我们不假定这种离散的地图的可用性。我们的工作朝着动作领域的连续性发展，通过视觉反馈提供可解释性，并允许在需要更精细的操作的命令上进行VLN，例如“两辆汽车之间的停车”。此外，我们提出了一种新型的元数据Carla-NAV，以允许有效的训练和验证。该数据集包括预录制的培训序列以及用于验证和测试的实时环境。我们提供广泛的定性和定量经验结果，以验证所提出的方法的功效。

We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, which pose this task as a node selection problem, given a discrete connected graph corresponding to the environment. We do not assume the availability of such a discretised map. Our work moves towards continuity in action space, provides interpretability through visual feedback and allows VLN on commands requiring finer manoeuvres like "park between the two cars". Furthermore, we propose a novel meta-dataset CARLA-NAV to allow efficient training and validation. The dataset comprises pre-recorded training sequences and a live environment for validation and testing. We provide extensive qualitative and quantitive empirical results to validate the efficacy of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题