论文标题
Trans4map:重新访问整体鸟的眼睛视图映射
Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers
论文作者
论文摘要
人具有天生的感知周围环境的能力,因为他们可以从以自我为中心的感知中提取空间表示,并通过空间转换和内存更新形成同类语义图。但是,由于两个困难,赋予具有这种空间传感能力的移动试剂仍然是一个挑战:(1)先前的卷积模型受到局部接收场的限制,因此,在观察过程中努力捕获整体的长距离依赖性; (2)成功所需的过度计算预算通常会导致映射管道将映射管道分为阶段,从而导致整个映射过程效率低下。为了解决这些问题,我们提出了一个基于映射的端到端一阶段变压器的框架,称为Trans4map。我们的以自我为中心的中心映射过程包括三个步骤:(1)有效的变压器从一批以自我为中心的图像中提取上下文特征; (2)提出的双向同类内存(BAM)模块将自我中心的特征投入到同类中心的内存中; (3)地图解码器解析了累积的内存并预测自上而下的语义分割图。相比之下,Trans4MAP取得了最新的结果,减少了67.2%的参数,但在MatterPort3D数据集上获得了 +3.25%MIOU和A +4.09%的MBF1改进。代码:https://github.com/jamycheung/trans4map。
Humans have an innate ability to sense their surroundings, as they can extract the spatial representation from the egocentric perception and form an allocentric semantic map via spatial transformation and memory updating. However, endowing mobile agents with such a spatial sensing ability is still a challenge, due to two difficulties: (1) the previous convolutional models are limited by the local receptive field, thus, struggling to capture holistic long-range dependencies during observation; (2) the excessive computational budgets required for success, often lead to a separation of the mapping pipeline into stages, resulting the entire mapping process inefficient. To address these issues, we propose an end-to-end one-stage Transformer-based framework for Mapping, termed Trans4Map. Our egocentric-to-allocentric mapping process includes three steps: (1) the efficient transformer extracts the contextual features from a batch of egocentric images; (2) the proposed Bidirectional Allocentric Memory (BAM) module projects egocentric features into the allocentric memory; (3) the map decoder parses the accumulated memory and predicts the top-down semantic segmentation map. In contrast, Trans4Map achieves state-of-the-art results, reducing 67.2% parameters, yet gaining a +3.25% mIoU and a +4.09% mBF1 improvements on the Matterport3D dataset. Code at: https://github.com/jamycheung/Trans4Map.