论文标题
Reloop:推荐系统的自我纠正持续学习循环
ReLoop: A Self-Correction Continual Learning Loop for Recommender Systems
论文作者
论文摘要
基于深度学习的建议已成为各种在线应用程序中广泛采用的技术。通常,部署的模型经常进行重新训练,以从新收集的交互日志中捕获用户的动态行为。但是,当前的模型培训过程只能获取用户的反馈作为标签,但未能考虑到以前的建议中遇到的错误。受到人类通常反映和从错误中学习的直觉的启发,我们试图为推荐系统构建一个自我纠正学习环(称为Relop)。特别是,采用了新的自定义损失来鼓励每个新模型版本,以减少培训期间对先前模型版本的预测错误。从长远来看,我们的Reloop学习框架可以连续进行自我纠正过程,因此有望在现有的培训策略中获得更好的性能。都进行了离线实验和在线A/B测试,以验证Relop的有效性。
Deep learning-based recommendation has become a widely adopted technique in various online applications. Typically, a deployed model undergoes frequent re-training to capture users' dynamic behaviors from newly collected interaction logs. However, the current model training process only acquires users' feedbacks as labels, but fail to take into account the errors made in previous recommendations. Inspired by the intuition that humans usually reflect and learn from mistakes, in this paper, we attempt to build a self-correction learning loop (dubbed ReLoop) for recommender systems. In particular, a new customized loss is employed to encourage every new model version to reduce prediction errors over the previous model version during training. Our ReLoop learning framework enables a continual self-correction process in the long run and thus is expected to obtain better performance over existing training strategies. Both offline experiments and an online A/B test have been conducted to validate the effectiveness of ReLoop.