流动基础双路径网络，用于跨场景视频人群的理解

论文标题

流动基础双路径网络，用于跨场景视频人群的理解

A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in Aerial View

论文作者

Zhao, Zhiyuan, Han, Tao, Gao, Junyu, Wang, Qi, Li, Xuelong

论文摘要

无人机射击可以应用于动态的交通监控，对象检测和跟踪以及其他视觉任务。拍摄位置的可变性为这些任务带来了一些棘手的挑战，例如变化的规模，不稳定的曝光和场景迁移。在本文中，我们努力应对上述挑战，并自动从无人机收集的视觉数据中自动了解人群。首先，为了减轻跨场测试中产生的背景噪声，提出了双流人群计数模型，该模型将光流和框架差信息提取为附加分支。此外，为了提高模型在不同尺度和时间的概括能力，我们随机结合了多种数据转换方法，以模拟某些看不见的环境。为了解决极端黑暗环境下的人群密度估计问题，我们介绍了Game Grand Theft Auto V（GTAV）生成的合成数据。实验结果显示了虚拟数据的有效性。我们的方法以12.70的平均绝对错误（MAE）赢得了挑战。此外，还进行了全面的消融研究，以探索每个组件的贡献。

Drones shooting can be applied in dynamic traffic monitoring, object detecting and tracking, and other vision tasks. The variability of the shooting location adds some intractable challenges to these missions, such as varying scale, unstable exposure, and scene migration. In this paper, we strive to tackle the above challenges and automatically understand the crowd from the visual data collected from drones. First, to alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed, which extracts optical flow and frame difference information as an additional branch. Besides, to improve the model's generalization ability at different scales and time, we randomly combine a variety of data transformation methods to simulate some unseen environments. To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV). Experiment results show the effectiveness of the virtual data. Our method wins the challenge with a mean absolute error (MAE) of 12.70. Moreover, a comprehensive ablation study is conducted to explore each component's contribution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题