论文标题
在交互环境中体现的转介表达以解决操纵问题
Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
论文作者
论文摘要
预计体现的代理将在互动环境中执行更复杂的任务,近年来体现AI的进展。现有的体现任务,包括体现的参考表达(ERE)和其他质量检查形式的任务,主要集中于语言教学术语的相互作用。因此,使代理人能够积极地在环境中操纵对象进行勘探已成为社区的挑战性问题。为了解决这个问题,我们引入了一个新的具体任务:远程体现的操纵询问答案(REMQA),以与操纵任务相结合。在REMQA任务中,代理需要导航到远程位置并使用目标对象执行操纵以回答问题。我们在AI2-Thor模拟器中为REMQA任务构建了一个基准数据集。为此,提出了具有3D语义重建和模块化网络范式的框架。提出了REMQA数据集上提议的框架的评估,以验证其有效性。
Embodied agents are expected to perform more complicated tasks in an interactive environment, with the progress of Embodied AI in recent years. Existing embodied tasks including Embodied Referring Expression (ERE) and other QA-form tasks mainly focuses on interaction in term of linguistic instruction. Therefore, enabling the agent to manipulate objects in the environment for exploration actively has become a challenging problem for the community. To solve this problem, We introduce a new embodied task: Remote Embodied Manipulation Question Answering (REMQA) to combine ERE with manipulation tasks. In the REMQA task, the agent needs to navigate to a remote position and perform manipulation with the target object to answer the question. We build a benchmark dataset for the REMQA task in the AI2-THOR simulator. To this end, a framework with 3D semantic reconstruction and modular network paradigms is proposed. The evaluation of the proposed framework on the REMQA dataset is presented to validate its effectiveness.