基于本质上动机的增强学习建议，并通过反事实数据扩展

论文标题

基于本质上动机的增强学习建议，并通过反事实数据扩展

Intrinsically Motivated Reinforcement Learning based Recommendation with Counterfactual Data Augmentation

论文作者

Chen, Xiaocong, Wang, Siyu, Yao, Lina, Qi, Lianyong, Li, Yong

论文摘要

深入的强化学习（DRL）已被证明其在捕获用户在最近的文献中的动态利益方面的效率。但是，由于推荐系统（RS）的环境稀疏，培训DRL代理很具有挑战性，DRL代理可以花费时间来探索信息丰富的用户互动轨迹或使用现有的轨迹进行策略学习。它也被称为勘探和剥削权衡取舍，当环境稀疏时会显着影响建议性能。平衡RS代理需要深入探索信息轨迹并在建议系统的背景下有效利用它们的探索和剥削更具挑战性。作为解决此问题的一步，我们在本质上设计了一种新颖的，动力的增强学习方法，以提高探索稀疏环境中信息相互作用轨迹的能力，该方法通过反事实增强策略进一步丰富了更有效的利用。在六个离线数据集和三个在线仿真平台上进行的广泛实验证明了我们模型比一组现有最新方法的优越性。

Deep reinforcement learning (DRL) has been proven its efficiency in capturing users' dynamic interests in recent literature. However, training a DRL agent is challenging, because of the sparse environment in recommender systems (RS), DRL agents could spend times either exploring informative user-item interaction trajectories or using existing trajectories for policy learning. It is also known as the exploration and exploitation trade-off which affects the recommendation performance significantly when the environment is sparse. It is more challenging to balance the exploration and exploitation in DRL RS where RS agent need to deeply explore the informative trajectories and exploit them efficiently in the context of recommender systems. As a step to address this issue, We design a novel intrinsically ,otivated reinforcement learning method to increase the capability of exploring informative interaction trajectories in the sparse environment, which are further enriched via a counterfactual augmentation strategy for more efficient exploitation. The extensive experiments on six offline datasets and three online simulation platforms demonstrate the superiority of our model to a set of existing state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题