论文标题

贝叶斯多尺度神经网络人群计数

Bayesian Multi-Scale Neural Network for Crowd Counting

论文作者

Sagar, Abhinav

论文摘要

人群计数是计算机视觉中的一项具有挑战性但至关重要的任务,其应用程序从公共安全到城市规划范围。使用估计密度图的卷积神经网络(CNN)的最新进展已显示出显着的成功。但是,由于严重的阻塞,尺度变化和透视扭曲,准确地计算出高度拥挤的场景中的个体仍然是一个空旷的问题,在整个图像中,人们出现在截然不同的大小。在这项工作中,我们提出了一种新颖的深度学习体系结构,可有效解决这些挑战。我们的网络集成了一个基于重新连接的功能提取器,用于捕获丰富的层次表示形式,然后使用扩张的卷积块下采样块,以保留空间分辨率,同时扩大了接受场。使用转置卷积的UPS采样块重建高分辨率密度图。我们体系结构的核心是一种新颖的视角 - 感知的聚合模块(PAM),旨在通过自适应聚集多尺度的上下文信息来增强尺度和透视变化的鲁棒性。我们详细说明培训程序,包括所使用的损失功能和优化策略。使用平均绝对误差(MAE)和平方误差(MSE)作为评估指标,在三个广泛使用的基准数据集上评估我们的方法。实验结果表明,与现有的最新方法相比,我们的模型取得了卓越的性能。此外,我们合并了有原则的贝叶斯推理技术,以提供不确定性估计以及人群计数预测,从而衡量了对模型输出的信心。

Crowd counting is a challenging yet critical task in computer vision with applications ranging from public safety to urban planning. Recent advances using Convolutional Neural Networks (CNNs) that estimate density maps have shown significant success. However, accurately counting individuals in highly congested scenes remains an open problem due to severe occlusions, scale variations, and perspective distortions, where people appear at drastically different sizes across the image. In this work, we propose a novel deep learning architecture that effectively addresses these challenges. Our network integrates a ResNet-based feature extractor for capturing rich hierarchical representations, followed by a downsampling block employing dilated convolutions to preserve spatial resolution while expanding the receptive field. An upsampling block using transposed convolutions reconstructs the high-resolution density map. Central to our architecture is a novel Perspective-aware Aggregation Module (PAM) designed to enhance robustness to scale and perspective variations by adaptively aggregating multi-scale contextual information. We detail the training procedure, including the loss functions and optimization strategies used. Our method is evaluated on three widely used benchmark datasets using Mean Absolute Error (MAE) and Mean Squared Error (MSE) as evaluation metrics. Experimental results demonstrate that our model achieves superior performance compared to existing state-of-the-art methods. Additionally, we incorporate principled Bayesian inference techniques to provide uncertainty estimates along with the crowd count predictions, offering a measure of confidence in the model's outputs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源