动态音频视频导航：在未绘制的3D环境中捕获闻所未闻的动态声源

论文标题

动态音频视频导航：在未绘制的3D环境中捕获闻所未闻的动态声源

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments

论文作者

Younes, Abdelrahman

论文摘要

关于音频导航的最新工作针对无噪声音频环境中的单个静态声音，并努力概括到闻所未闻的声音。我们介绍了新颖的动态视听导航基准，其中体现的AI代理必须在有分散器和嘈杂的声音的情况下在未模型的环境中捕获移动的声源。我们提出了一种端到端的强化学习方法，该方法依赖于多模式体系结构，该架构可以从双耳音频信号和空间占用图中融合空间音频视频信息，以编码为我们新的复杂任务设置学习强大的导航策略所需的功能。我们证明，我们的方法在两个具有挑战性的3D扫描现实世界数据集副本和Matterport3D上更好地概括了当前的最新技术，并且可以更好地对噪音，并为静态和动态的音频导航基准。我们的小说基准将在http://dav-nav.cs.uni-freiburg.de上提供。

Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds. We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds. We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps to encode the features needed to learn a robust navigation policy for our new complex task settings. We demonstrate that our approach outperforms the current state-of-the-art with better generalization to unheard sounds and better robustness to noisy scenarios on the two challenging 3D scanned real-world datasets Replica and Matterport3D, for the static and dynamic audio-visual navigation benchmarks. Our novel benchmark will be made available at http://dav-nav.cs.uni-freiburg.de.

下载PDF全文

下载文献需遵守相关版权规定

论文标题