论文标题
为电子商务订单欺诈评估进行基准测试脱机增强学习算法
Benchmarking Offline Reinforcement Learning Algorithms for E-Commerce Order Fraud Evaluation
论文作者
论文摘要
亚马逊和其他电子商务网站必须采用机制来保护其数百万客户免受欺诈行为,例如未经授权使用信用卡。这样的机制是订单欺诈评估,系统会在其中评估欺诈风险的订单,并“通过”命令,或采取措施减轻高风险。订单欺诈评估系统通常使用区分欺诈和合法命令的二进制分类模型,以评估风险并采取行动。我们试图设计一个考虑欺诈财务损失和长期客户满意度的系统,当将不正确的措施应用于合法客户时可能会受到损害。我们建议采取行动优化长期影响,可以作为加强学习(RL)问题提出。标准RL方法需要与环境学习的在线互动,但是在诸如订单欺诈评估之类的高风险应用程序中,这是不可取的。离线RL算法从从环境中收集的记录数据中学习,而无需在线互动,使其适合我们的用例。我们表明,离线RL方法的表现优于Simstore的传统二进制分类解决方案,这是一种简化的电子商务模拟,结合了订单欺诈风险。我们还提出了一种新颖的方法来培训离线RL政策,该方法在培训期间增加了新的损失术语,以更好地使政策探索与正确的行动保持一致。
Amazon and other e-commerce sites must employ mechanisms to protect their millions of customers from fraud, such as unauthorized use of credit cards. One such mechanism is order fraud evaluation, where systems evaluate orders for fraud risk, and either "pass" the order, or take an action to mitigate high risk. Order fraud evaluation systems typically use binary classification models that distinguish fraudulent and legitimate orders, to assess risk and take action. We seek to devise a system that considers both financial losses of fraud and long-term customer satisfaction, which may be impaired when incorrect actions are applied to legitimate customers. We propose that taking actions to optimize long-term impact can be formulated as a Reinforcement Learning (RL) problem. Standard RL methods require online interaction with an environment to learn, but this is not desirable in high-stakes applications like order fraud evaluation. Offline RL algorithms learn from logged data collected from the environment, without the need for online interaction, making them suitable for our use case. We show that offline RL methods outperform traditional binary classification solutions in SimStore, a simplified e-commerce simulation that incorporates order fraud risk. We also propose a novel approach to training offline RL policies that adds a new loss term during training, to better align policy exploration with taking correct actions.