论文标题

带有滞后标签的概念漂移和协变性转移检测集合

Concept Drift and Covariate Shift Detection Ensemble with Lagged Labels

论文作者

Xu, Yiming, Klabjan, Diego

论文摘要

在模型服务中,在整个终身推论过程中拥有一个固定模型通常对模型性能有害,因为数据分布会随着时间的流逝而演变,从而导致对历史数据训练的模型缺乏可靠性。检测变化并及时重新训练模型很重要。现有方法通常具有三个弱点:1)仅使用分类错误率作为信号,2)假设接收样品的特征后立即获得地面真实标签,而3)无法决定在发生更改时使用哪些数据来重新培训该模型。我们通过利用六个不同的信号来捕获广泛的数据特征来解决第一个问题,并通过允许标签的滞后来解决第二个问题,该标签滞后在滞后后接收到相应功能的标签。对于第三个问题,我们提出的方法会自动决定根据信号进行重新训练的数据。对不同类型数据变化的结构化和非结构化数据的广泛实验表明,我们的方法始终超过最新的方法。

In model serving, having one fixed model during the entire often life-long inference process is usually detrimental to model performance, as data distribution evolves over time, resulting in lack of reliability of the model trained on historical data. It is important to detect changes and retrain the model in time. The existing methods generally have three weaknesses: 1) using only classification error rate as signal, 2) assuming ground truth labels are immediately available after features from samples are received and 3) unable to decide what data to use to retrain the model when change occurs. We address the first problem by utilizing six different signals to capture a wide range of characteristics of data, and we address the second problem by allowing lag of labels, where labels of corresponding features are received after a lag in time. For the third problem, our proposed method automatically decides what data to use to retrain based on the signals. Extensive experiments on structured and unstructured data for different type of data changes establish that our method consistently outperforms the state-of-the-art methods by a large margin.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源