ESS：从静止图像中学习基于事件的语义细分

论文标题

ESS：从静止图像中学习基于事件的语义细分

ESS: Learning Event-based Semantic Segmentation from Still Images

论文作者

Sun, Zhaoning, Messikommer, Nico, Gehrig, Daniel, Scaramuzza, Davide

论文摘要

由于严重的图像降解，检索挑战性高动态范围（HDR）和高速条件的精确语义信息仍然是基于图像的算法的开放挑战。事件摄像机有望应对这些挑战，因为它们具有更高的动态范围，并且对运动模糊具有弹性。但是，使用事件摄像机的语义细分仍处于起步阶段，这主要是由于缺乏高质量的标记数据集所致。在这项工作中，我们介绍了ESS（基于事件的语义细分），该工作通过将语义细分任务直接从现有标记的图像数据集中传输到无标记的事件来解决此问题，从而解决了该问题。与现有的UDA方法相比，我们的方法与图像嵌入的经常性运动不变的事件嵌入对齐。因此，我们的方法既不需要视频数据，也不需要图像和事件之间的每个像素对齐，也不需要从静止图像中幻觉运动。此外，我们介绍了DSEC-Semantic，这是第一个带有细粒标签的基于大规模事件的数据集。我们表明，仅使用图像标签，ESS优于现有的UDA方法，并且与事件标签结合使用，它甚至超过了DDD17和DSEC-Semantic上最先进的监督方法。最后，ESS是通用的，它可以解锁大量现有标记的图像数据集，并为事件摄像机无法访问的新领域中的新和令人兴奋的研究方向铺平了道路。

Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the lack of high-quality, labeled datasets. In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.

下载PDF全文

下载文献需遵守相关版权规定

论文标题