论文标题
基于频率的学习对个人数字痕迹的方法
A Frequency-Based Learning-To-Rank Approach for Personal Digital Traces
论文作者
论文摘要
个人数字痕迹不断由连接的设备,互联网服务和交互作用产生。这些数字轨迹通常很小,异质性,并且存储在云中的各个位置或本地设备上,这使用户与与自己的数据进行交互和搜索的挑战是一个挑战。 By adopting a multidimensional data model based on the six natural questions -- what, when, where, who, why and how -- to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results.由于缺乏公开可用的个人培训数据,使用已知项目的生成技术和无监督的排名模型(基于现场的BM25)的组合用于构建我们自己的培训集。在真实用户的公开电子邮件收集和个人数字数据跟踪收集中进行的实验表明,与传统搜索工具相比,基于频率的学习方法提高了搜索准确性。
Personal digital traces are constantly produced by connected devices, internet services and interactions. These digital traces are typically small, heterogeneous and stored in various locations in the cloud or on local devices, making it a challenge for users to interact with and search their own data. By adopting a multidimensional data model based on the six natural questions -- what, when, where, who, why and how -- to represent and unify heterogeneous personal digital traces, we can propose a learning-to-rank approach using the state of the art LambdaMART algorithm and frequency-based features that leverage the correlation between content (what), users (who), time (when), location (where) and data source (how) to improve the accuracy of search results. Due to the lack of publicly available personal training data, a combination of known-item query generation techniques and an unsupervised ranking model (field-based BM25) is used to build our own training sets. Experiments performed over a publicly available email collection and a personal digital data trace collection from a real user show that the frequency-based learning approach improves search accuracy when compared with traditional search tools.