论文标题
对Covid-19的可预测性进行回测
Backtesting the predictability of COVID-19
论文作者
论文摘要
COVID-19的大流行的出现在全球许多国家煽动了前所未有的变化,给卫生部门带来了重大负担,影响了宏观经济状况,并改变了人口之间的社交互动。作为回应,学术界已经产生了多种预测模型,方法和算法,以最好地预测COVID-19的不同指标,例如确认的受感染病例的数量。然而,研究人员几乎没有关于大流行的历史信息,以便告知他们的预测方法。我们的工作研究大流行的各个阶段模型的预测性能,以更好地了解其基本不确定性以及数据可用性对此类预测的影响。我们使用从2020年1月22日至2020年6月22日的253个地区的COVID-19感染的历史数据,通过滚动窗口进行回测框架预测未来7和28天被感染病例的累积数量。我们实施了三个简单模型,以跟踪这个6个月跨度的根平方对数误差,这是一个基线模型,该模型始终预测累积确认的情况的最后一个已知值,功率增长模型和称为SEIRD的流行病学模型。在大流行的早期,由于数据有限,预测错误在大流行的早期阶段要高得多。在整个大流行过程中,错误会缓慢地回落,但稳定地退化。一个国家在任何时间点表现出的案件越多,预测未来确认案件的错误就越低。我们强调具有严格的回测框架的意义,以在爆发期间的任何时间点准确评估此类模型的预测能力,而爆发又可以用来为这些预测分配正确的确定性水平并促进更好的计划。
The advent of the COVID-19 pandemic has instigated unprecedented changes in many countries around the globe, putting a significant burden on the health sectors, affecting the macro economic conditions, and altering social interactions amongst the population. In response, the academic community has produced multiple forecasting models, approaches and algorithms to best predict the different indicators of COVID-19, such as the number of confirmed infected cases. Yet, researchers had little to no historical information about the pandemic at their disposal in order to inform their forecasting methods. Our work studies the predictive performance of models at various stages of the pandemic to better understand their fundamental uncertainty and the impact of data availability on such forecasts. We use historical data of COVID-19 infections from 253 regions from the period of 22nd January 2020 until 22nd June 2020 to predict, through a rolling window backtesting framework, the cumulative number of infected cases for the next 7 and 28 days. We implement three simple models to track the root mean squared logarithmic error in this 6-month span, a baseline model that always predicts the last known value of the cumulative confirmed cases, a power growth model and an epidemiological model called SEIRD. Prediction errors are substantially higher in early stages of the pandemic, resulting from limited data. Throughout the course of the pandemic, errors regress slowly, but steadily. The more confirmed cases a country exhibits at any point in time, the lower the error in forecasting future confirmed cases. We emphasize the significance of having a rigorous backtesting framework to accurately assess the predictive power of such models at any point in time during the outbreak which in turn can be used to assign the right level of certainty to these forecasts and facilitate better planning.