ENVEDIT：视觉和语言导航的环境编辑

论文标题

ENVEDIT：视觉和语言导航的环境编辑

EnvEdit: Environment Editing for Vision-and-Language Navigation

论文作者

Li, Jialu, Tan, Hao, Bansal, Mohit

论文摘要

在视觉和语言导航（VLN）中，代理需要根据自然语言说明在环境中导航。由于导航环境中代理培训和有限多样性的可用数据有限，因此代理商将推广到新的，看不见的环境的挑战。为了解决这个问题，我们建议通过编辑现有环境来创建新环境的数据增强方法ENVEDIT，该环境用于训练更具概括的代理。我们的增强环境在三个不同的方面可能与所见环境不同：样式，对象外观和对象类。对这些编辑的环境进行培训可以防止代理到现有环境过度拟合，并有助于更好地推广到新的，看不见的环境。从经验上讲，在房间对房间和多语言房间 - 房间数据集上，我们表明我们提议的ENVEDIT方法在预先培训和非训练的VLN代理商的所有指标上都有显着改善，并在测试排行榜上实现了新的最新技术。我们进一步整合了VLN代理在不同编辑的环境上增强的，并表明这些编辑方法是互补的。代码和数据可从https://github.com/jialuli-luka/envedit获得

In Vision-and-Language Navigation (VLN), an agent needs to navigate through the environment based on natural language instructions. Due to limited available data for agent training and finite diversity in navigation environments, it is challenging for the agent to generalize to new, unseen environments. To address this problem, we propose EnvEdit, a data augmentation method that creates new environments by editing existing environments, which are used to train a more generalizable agent. Our augmented environments can differ from the seen environments in three diverse aspects: style, object appearance, and object classes. Training on these edit-augmented environments prevents the agent from overfitting to existing environments and helps generalize better to new, unseen environments. Empirically, on both the Room-to-Room and the multi-lingual Room-Across-Room datasets, we show that our proposed EnvEdit method gets significant improvements in all metrics on both pre-trained and non-pre-trained VLN agents, and achieves the new state-of-the-art on the test leaderboard. We further ensemble the VLN agents augmented on different edited environments and show that these edit methods are complementary. Code and data are available at https://github.com/jialuli-luka/EnvEdit

下载PDF全文

下载文献需遵守相关版权规定

论文标题