论文标题
用于部署的机器学习模型的监视框架,供应链示例
A monitoring framework for deployed machine learning models with supply chain examples
论文作者
论文摘要
在生产操作期间积极监视机器学习模型有助于确保预测质量和检测以及对意外或不希望条件的补救。已经部署在大数据环境中的监视模型带来了与现有建模工作流和控制资源需求并行添加监视的其他挑战。在本文中,我们描述了(1)监视机器学习模型的框架; (2)对大数据供应链应用程序的实施。我们使用我们的实现来研究模型功能,预测和性能的漂移,这是在三个真实数据集中研究。我们比较了使用Kolmogorov-Smirnov距离和Bhattacharyya系数在特征和预测中进行漂移检测的假设检验和信息理论方法。结果表明,在评估期间,模型性能稳定。特征和预测显示出具有统计学意义的漂移;但是,这些漂移与我们研究期间模型性能的变化无关。
Actively monitoring machine learning models during production operations helps ensure prediction quality and detection and remediation of unexpected or undesired conditions. Monitoring models already deployed in big data environments brings the additional challenges of adding monitoring in parallel to the existing modelling workflow and controlling resource requirements. In this paper, we describe (1) a framework for monitoring machine learning models; and, (2) its implementation for a big data supply chain application. We use our implementation to study drift in model features, predictions, and performance on three real data sets. We compare hypothesis test and information theoretic approaches to drift detection in features and predictions using the Kolmogorov-Smirnov distance and Bhattacharyya coefficient. Results showed that model performance was stable over the evaluation period. Features and predictions showed statistically significant drifts; however, these drifts were not linked to changes in model performance during the time of our study.