论文标题
从不完整的数据中学习时空特征,用于使用混合深神经网络进行交通流量预测
Learning spatiotemporal features from incomplete data for traffic flow prediction using hybrid deep neural networks
论文作者
论文摘要
使用数据驱动模型的城市交通流量预测可以在路线规划中发挥重要作用,并防止在高速公路上交通拥堵。这些方法利用从不同时间戳的交通记录站收集的数据来预测流量的未来状态。因此,数据收集,传输,存储和提取技术可能会对交通流模型的性能产生重大影响。另一方面,全面的数据库可以为使用复杂但可靠的预测模型(例如深度学习方法)提供机会。但是,这些方法中的大多数在处理缺失值和离群值方面都有困难。这项研究着重于混合深神经网络,以预测具有缺失值的加利福尼亚高速公路绩效测量系统(PEMS)中的交通流量。所提出的网络基于复发性神经网络(RNN)的组合,以考虑每个站点记录的数据和卷积神经网络(CNN)的时间依赖性,以考虑相邻站点中的空间相关性。基于RNN和CNN的各种具有串联和并行连接的体系结构配置,并且使用了几种普遍的数据插补技术来检查混合网络对缺失值的鲁棒性。对PEM的两个不同数据集进行的全面分析表明,具有平均归纳技术的拟议串联 - 平行混合网络在预测交通流方面达到了最低的误差,并且在不完整的训练数据方案中均缺失了丢失值,直到将21%的丢失比率应用于不完整的测试数据。
Urban traffic flow prediction using data-driven models can play an important role in route planning and preventing congestion on highways. These methods utilize data collected from traffic recording stations at different timestamps to predict the future status of traffic. Hence, data collection, transmission, storage, and extraction techniques can have a significant impact on the performance of the traffic flow model. On the other hand, a comprehensive database can provide the opportunity for using complex, yet reliable predictive models such as deep learning methods. However, most of these methods have difficulties in handling missing values and outliers. This study focuses on hybrid deep neural networks to predict traffic flow in the California Freeway Performance Measurement System (PeMS) with missing values. The proposed networks are based on a combination of recurrent neural networks (RNNs) to consider the temporal dependencies in the data recorded in each station and convolutional neural networks (CNNs) to take the spatial correlations in the adjacent stations into account. Various architecture configurations with series and parallel connections are considered based on RNNs and CNNs, and several prevalent data imputation techniques are used to examine the robustness of the hybrid networks to missing values. A comprehensive analysis performed on two different datasets from PeMS indicates that the proposed series-parallel hybrid network with the mean imputation technique achieves the lowest error in predicting the traffic flow and is robust to missing values up until 21% missing ratio in both complete and incomplete training data scenarios when applied to an incomplete test data.