论文标题
基于搜索的用户兴趣建模,具有终身顺序行为数据,以进行点击率预测
Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction
论文作者
论文摘要
事实证明,丰富的用户行为数据对于点击率预测任务具有很高的价值,尤其是在推荐系统和在线广告等工业应用中。行业和学院都非常关注这个主题,并提出了使用长期顺序用户行为数据进行建模的不同方法。其中,基于内存网络的模型MIMN由阿里巴巴提出,可以通过学习算法和服务系统的共同设计来实现SOTA。 MIMN是第一个可以对序列用户行为数据进行建模的工业解决方案,其长度最高为1000。但是,当用户行为序列的长度进一步增加10倍或更多时,MIMN无法精确地捕获给定特定候选项目的用户兴趣。这一挑战在先前提出的方法中广泛存在。在本文中,我们通过设计一个新的建模范式来解决此问题,我们将其称为基于搜索的兴趣模型(SIM)。 SIM卡用两个级联搜索单元提取用户兴趣:(i)从原始和任意的长期顺序行为数据中使用一般搜索单元,并从候选项目中查询信息,并获得与候选项目相关的SUB用户行为序列; (ii)精确的搜索单元模拟候选项目与SBS之间的精确关系。这种级联的搜索范式使SIM可以具有更好的能力,可以在可伸缩性和准确性中对终身行为数据进行建模。除了学习算法外,我们还介绍了有关如何在大型工业系统中实施SIM的动手经验。自2019年以来,SIM已部署在阿里巴巴的展示广告系统中,带来了7.1 \%CTR和4.4 \%RPM LIFT,这对业务很重要。现在,SIM模型的用户行为数据最大长度可达54000,将SOTA推到54倍,现在为我们的真实系统中的主要流量提供服务。
Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches. In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence which is relevant to candidate item; (ii) Exact Search Unit models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1\% CTR and 4.4\% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.