布局意识的梦想家用于体现的参考表达接地

论文标题

布局意识的梦想家用于体现的参考表达接地

Layout-aware Dreamer for Embodied Referring Expression Grounding

论文作者

Li, Mingxiao, Wang, Zehao, Tuytelaars, Tinne, Moens, Marie-Francine

论文摘要

在这项工作中，我们研究了体现的参考表达接地的问题，在该问题中，代理需要在以前看不见的环境中导航，并定位由简洁的高级自然语言指导描述的远程对象。当面对这种情况时，人类倾向于想象目的地可能是什么样的，并根据对环境布局的先验知识进行探索环境，例如，浴室比厨房更有可能在卧室附近找到浴室。我们设计了一个称为“布局意识梦”（LAD）的自主代理，其中包括两个新颖的模块，即布局学习者和目标梦想家，以模仿这一认知决策过程。布局学习者学会了推断相邻未探索区域的房间类别分布，沿着粗略布局估算的路径，这有效地向我们的代理商引入了布局常见的房间对房间过渡。为了学习对环境的有效探索，目标梦想家事先想象了目的地。我们的代理商在Reverie数据集的公共排行榜上实现了新的最先进的表现，在挑战未见的测试环境中，导航成功的改善（SR）提高了4.02％，而远程接地成功（RGS）则比以前的最新前所未有。该代码在https://github.com/zehao-wang/lad上发布

In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environmental layout, such as the fact that a bathroom is more likely to be found near a bedroom than a kitchen. We have designed an autonomous agent called Layout-aware Dreamer (LAD), including two novel modules, that is, the Layout Learner and the Goal Dreamer to mimic this cognitive decision process. The Layout Learner learns to infer the room category distribution of neighboring unexplored areas along the path for coarse layout estimation, which effectively introduces layout common sense of room-to-room transitions to our agent. To learn an effective exploration of the environment, the Goal Dreamer imagines the destination beforehand. Our agent achieves new state-of-the-art performance on the public leaderboard of the REVERIE dataset in challenging unseen test environments with improvement in navigation success (SR) by 4.02% and remote grounding success (RGS) by 3.43% compared to the previous state-of-the-art. The code is released at https://github.com/zehao-wang/LAD

下载PDF全文

下载文献需遵守相关版权规定

论文标题