Transvisdrone：空中视频中基于视觉的无人机检测的时空变压器

论文标题

Transvisdrone：空中视频中基于视觉的无人机检测的时空变压器

TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos

论文作者

Sangam, Tushar, Dave, Ishan Rajendrakumar, Sultani, Waqas, Shah, Mubarak

论文摘要

使用Visual Feed检测无人机对无人机检测具有至关重要的应用，例如检测无人机碰撞，检测无人机攻击或与其他无人机进行协调。但是，现有方法在计算上是昂贵的，遵循非端到端的优化，并具有复杂的多阶段管道，从而使其不太适合在边缘设备上实时部署。在这项工作中，我们提出了一个简单而有效的框架，即\ textit {transvisdrone}，该框架提供了一个具有较高计算效率的端到端解决方案。我们利用CSPDARKNET-53网络来学习与对象相关的空间特征和Videoswin模型，以通过学习无人机运动的时空依赖性来改善无人机检测。我们的方法在三个具有挑战性的现实数据集（平均[email protected]）上实现了最先进的性能：NPS 0.95，FLDRONES 0.75和AOT 0.80，并且吞吐量高于以前的方法。我们还展示了其在边缘设备上的部署能力及其在检测无人机碰撞方面的实用性（相遇）。项目：\ url {https://tusharsangam.github.io/transvisdrone-project-page/}。

Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. However, existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. In this work, we propose a simple yet effective framework, \textit{TransVisDrone}, that provides an end-to-end solution with higher computational efficiency. We utilize CSPDarkNet-53 network to learn object-related spatial features and VideoSwin model to improve drone detection in challenging scenarios by learning spatio-temporal dependencies of drone motion. Our method achieves state-of-the-art performance on three challenging real-world datasets (Average [email protected]): NPS 0.95, FLDrones 0.75, and AOT 0.80, and a higher throughput than previous methods. We also demonstrate its deployment capability on edge devices and its usefulness in detecting drone-collision (encounter). Project: \url{https://tusharsangam.github.io/TransVisDrone-project-page/}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题