Userbert：通过自学的长期和短期用户偏好建模

论文标题

Userbert：通过自学的长期和短期用户偏好建模

UserBERT: Modeling Long- and Short-Term User Preferences via Self-Supervision

论文作者

Li, Tianyu, Cevahir, Ali, Cho, Derek, Gong, Hao, Nguyen, DuyKhuong, Stenger, Bjorn

论文摘要

电子商务平台每天都会从数百万个唯一用户那里生成大量的客户行为数据，例如点击和购买。但是，有效地使用此数据进行行为理解任务是具有挑战性的，因为通常没有足够的标签以监督的方式向所有用户学习。本文将BERT模型扩展到电子商务用户数据，以自我监督的方式进行预培训表示。通过将序列中的用户操作视为类似于句子中的单词，我们将现有的BERT模型扩展到用户行为数据。此外，我们的模型采用统一的结构来同时从长期和短期用户行为以及用户属性中学习。我们提出了对不同类型的用户行为序列的令牌化，输入表示向量的产生以及一个新颖的借口任务的方法，以使预训练的模型能够从其自己的输入中学习，从而消除了对标记培训数据的需求。广泛的实验表明，学习的表示形式在转移到三个不同的现实世界任务时会大大改善，尤其是与特定于任务的建模和多任务表示学习相比

E-commerce platforms generate vast amounts of customer behavior data, such as clicks and purchases, from millions of unique users every day. However, effectively using this data for behavior understanding tasks is challenging because there are usually not enough labels to learn from all users in a supervised manner. This paper extends the BERT model to e-commerce user data for pre-training representations in a self-supervised manner. By viewing user actions in sequences as analogous to words in sentences, we extend the existing BERT model to user behavior data. Further, our model adopts a unified structure to simultaneously learn from long-term and short-term user behavior, as well as user attributes. We propose methods for the tokenization of different types of user behavior sequences, the generation of input representation vectors, and a novel pretext task to enable the pre-trained model to learn from its own input, eliminating the need for labeled training data. Extensive experiments demonstrate that the learned representations result in significant improvements when transferred to three different real-world tasks, particularly compared to task-specific modeling and multi-task representation learning

下载PDF全文

下载文献需遵守相关版权规定

论文标题