论文标题
长期自定义过敏季节预测的多变量三回归预测算法
A Multi-Variate Triple-Regression Forecasting Algorithm for Long-Term Customized Allergy Season Prediction
论文作者
论文摘要
在本文中,我们提出了一种新型的多变量算法,使用三重回归方法来预测可长期为每个患者定制的空气传播过敏季节。为了提高预测准确性,我们首先执行预处理,以整合花粉浓度的历史数据和来自其他协变量(例如气象数据)的各种推论信号。然后,我们提出了一种新型算法,该算法涵盖了三阶段回归:在第1阶段,一个回归模型,以预测空气中 - 托管过敏季节的起始/结束日期的开始/结束日期,该模型是从从12个时间序列的协方差序列中提取的特征矩阵中训练的。在第2阶段,根据特征矩阵对相应的不确定性进行预测的回归模型,并从第1阶段产生了预测。在第3阶段中,加权线性回归模型是基于阶段1和2的预测结果建立的。可以观察到并证明阶段3有助于提高预测准确性,并降低多变量三重回归算法的不确定性。基于不同的过敏敏感性水平,花粉的触发浓度 - 过敏季节的定义可以单独自定义。在我们的回测,使用该算法达到了4.7天的平均绝对误差(MAE)。我们得出的结论是,该算法可能适用于通用和长期预测问题。
In this paper, we propose a novel multi-variate algorithm using a triple-regression methodology to predict the airborne-pollen allergy season that can be customized for each patient in the long term. To improve the prediction accuracy, we first perform a pre-processing to integrate the historical data of pollen concentration and various inferential signals from other covariates such as the meteorological data. We then propose a novel algorithm which encompasses three-stage regressions: in Stage 1, a regression model to predict the start/end date of a airborne-pollen allergy season is trained from a feature matrix extracted from 12 time series of the covariates with a rolling window; in Stage 2, a regression model to predict the corresponding uncertainty is trained based on the feature matrix and the prediction result from Stage 1; in Stage 3, a weighted linear regression model is built upon prediction results from Stage 1 and 2. It is observed and proved that Stage 3 contributes to the improved forecasting accuracy and the reduced uncertainty of the multi-variate triple-regression algorithm. Based on different allergy sensitivity level, the triggering concentration of the pollen - the definition of the allergy season can be customized individually. In our backtesting, a mean absolute error (MAE) of 4.7 days was achieved using the algorithm. We conclude that this algorithm could be applicable in both generic and long-term forecasting problems.